Stata users have access to two easy-to-use implementations of Bayesian inference: Stata’s native bayesmh function and StataStan, which calls the general Bayesian engine Stan. We compare these on two models that are important for education research: the Rasch model and the hierarchical Rasch model. Stan (as called from Stata) fits a more general range of models than can be fit by bayesmh and is also more scalable, in that it could easily fit models with at least ten times more parameters than could be fit using Stata’s native Bayesian implementation. In addition, Stan runs between two and ten times faster than bayesmh as measured in effective sample size per second: that is, compared to Stan, it takes Stata’s built-in Bayesian engine twice to ten times as long to get inferences with equivalent precision. We attribute Stan’s advantage in flexibility to its general modeling language, and its advantages in scalability and speed to an efficient sampling algorithm: Hamiltonian Monte Carlo using the no-U-turn sampler. In order to further investigate scalability, we also compared to the package Jags, which performed better than Stata’s native Bayesian engine but not as well as StataStan.
Here’s the punchline:
This is no surprise; still, it’s reassuring to see. (The lines in the graphs look a little jagged because we did just one simulation, from which the results are clear enough.)
Stan’s real advantage comes not just from speed but from flexibility—Stan can fit any continuous parameter model for which you can write the log-density—and from scalability: you can fit bigger models to bigger datasets. We’re moving closer to a one-size-fits-most data analysis tool where we don’t have to jump from program to program as our data and modeling needs increase.