Skip to content
Archive of posts filed under the Statistical computing category.

“Developers Who Use Spaces Make More Money Than Those Who Use Tabs”

Rudy Malka writes: I think you’ll enjoy this nice piece of pop regression by David Robinson: developers who use spaces make more money than those who use tabs. I’d like to know your opinion about it. At the above link, Robinson discusses a survey that allows him to compare salaries of software developers who use […]

SPEED: Parallelizing Stan using the Message Passing Interface (MPI)

Sebastian Weber writes: Bayesian inference has to overcome tough computational challenges and thanks to Stan we now have a scalable MCMC sampler available. For a Stan model running NUTS, the computational cost is dominated by gradient calculations of the model log-density as a function of the parameters. While NUTS is scalable to huge parameter spaces, […]

Workshop on reproducibility in machine learning

Alex Lamb writes: My colleagues and I are organizing a workshop on reproducibility and replication for the International Conference on Machine Learning (ICML). I’ve read some of your blog posts on the replication crisis in the social sciences and it seems like this workshop might be something that you’d be interested in. We have three […]

Using external C++ functions with PyStan & radial velocity exoplanets

Dan Foreman-Mackey writes: I [Mackey] demonstrate how to use a custom C++ function in a Stan model using the Python interface PyStan. This was previously only possible using the R interface RStan (see an example here) so I hacked PyStan to make this possible in Python as well. . . . I have some existing […]

Another serious error in my published work!

Uh oh, I’m starting to feel like that pizzagate guy . . . Here’s the background. When I talk about my serious published errors, I talk about my false theorem, I talk about my empirical analysis that was invalidated by miscoded data, I talk my election maps whose flaws were pointed out by an angry […]

Hello, world! Stan, PyMC3, and Edward

Being a computer scientist, I like to see “Hello, world!” examples of programming languages. Here, I’m going to run down how Stan, PyMC3 and Edward tackle a simple linear regression problem with a couple of predictors. No, I’m not going to take sides—I’m on a fact-finding mission. We (the Stan development team) have been trying […]

Theoretical Statistics is the Theory of Applied Statistics: How to Think About What We Do

Above is my talk at the 2017 New York R conference. Look, no slides! The talk went well. I think the video would be more appealing to listen to if they’d mixed in more of the crowd noise. Then you’d hear people laughing at all the right spots. P.S. Here’s my 2016 NYR talk, and […]

Visualizing your fitted Stan model using ShinyStan without interfering with your Rstudio session

ShinyStan is great, but I don’t always use it because when you call it from R, it freezes up your R session until you close the ShinyStan window. But it turns out that it doesn’t have to be that way. Imad explains: You can open up a new session via the RStudio menu bar (Session […]

Design top down, Code bottom up

Top-down design means designing from the client application programmer interface (API) down to the code. The API lays out a precise functional specification, which says what the code will do, not how it will do it. Coding bottom up means coding the lowest-level foundations first, testing them, then continuing to build. Sometimes this requires dropping […]

A continuous hinge function for statistical modeling

This comes up sometimes in my applied work: I want a continuous “hinge function,” something like the red curve above, connecting two straight lines in a smooth way. Why not include the sharp corner (in this case, the function y=-0.5*x if x<0 or y=0.2*x if x>0)? Two reasons. First, computation: Hamiltonian Monte Carlo can trip […]

Using Stan for week-by-week updating of estimated soccer team abilites

Milad Kharratzadeh shares this analysis of the English Premier League during last year’s famous season. He fit a Bayesian model using Stan, and the R markdown file is here. The analysis has three interesting features: 1. Team ability is allowed to continuously vary throughout the season; thus, once the season is over, you can see […]

Should computer programming be a prerequisite for learning statistics?

[cat picture] This came up in a recent discussion thread, I can’t remember exactly where. A commenter pointed out, correctly, that you shouldn’t require computer programming as a prerequisite for a statistics course: there’s lots in statistics that can be learned without knowing how to program. Sure, if you can program you can do a […]

Splines in Stan! (including priors that enforce smoothness)

Milad Kharratzadeh shares a new case study. This could be useful to a lot of people. And here’s the markdown file with every last bit of R and Stan code. Just for example, here’s the last section of the document, which shows how to simulate the data and fit the model graphed above: Location of […]

I hate R, volume 38942

link R doesn’t allow block comments. You have to comment out each line, or you can encapsulate the block in if(0){} which is the world’s biggest hack. Grrrrr. P.S. Just to clarify: I want block commenting not because I want to add long explanatory blocks of text to annotate my scripts. I want block commenting […]

Fitting hierarchical GLMs in package X is like driving car Y

Given that Andrew started the Gremlin theme, I thought it would only be fitting to link to the following amusing blog post: Chris Brown: Choosing R packages for mixed effects modelling based on the car you drive (on the seascape models blog) It’s exactly what it says on the tin. I won’t spoil the punchline, […]

Bayesian Posteriors are Calibrated by Definition

Time to get positive. I was asking Andrew whether it’s true that I have the right coverage in Bayesian posterior intervals if I generate the parameters from the prior and the data from the parameters. He replied that yes indeed that is true, and directed me to: Cook, S.R., Gelman, A. and Rubin, D.B. 2006. […]

Stacking, pseudo-BMA, and AIC type weights for combining Bayesian predictive distributions

This post is by Aki. We have often been asked in the Stan user forum how to do model combination for Stan models. Bayesian model averaging (BMA) by computing marginal likelihoods is challenging in theory and even more challenging in practice using only the MCMC samples obtained from the full model posteriors. Some users have […]

“Scalable Bayesian Inference with Hamiltonian Monte Carlo” (Michael Betancourt’s talk this Thurs at Columbia)

Scalable Bayesian Inference with Hamiltonian Monte Carlo Despite the promise of big data, inferences are often limited not by sample size but rather by systematic effects. Only by carefully modeling these effects can we take full advantage of the data—big data must be complemented with big models and the algorithms that can fit them. One […]

Running Stan with external C++ code

Ben writes: Starting with the 2.13 release, it is much easier to use external C++ code in a Stan program. This vignette briefly illustrates how to do so.

Ensemble Methods are Doomed to Fail in High Dimensions

Ensemble methods [cat picture] By ensemble methods, I (Bob, not Andrew) mean approaches that scatter points in parameter space and then make moves by inteprolating or extrapolating among subsets of them. Two prominent examples are: Ter Braak’s differential evolution   Goodman and Weare’s walkers There are extensions and computer implementations of these algorithms. For example, […]