Skip to content

More Stan on the blog

Whoa. Stan is 3 years old. We’ve come a long way since the start. I came into the project just as a working prototype was implemented by Matt and Bob with discussions with Andrew, Ben, Michael Malecki, Jiqiang, and others. (I had been working for Andrew prior to the official start of the project, but was on the injured reserve list at the time.)

Just for kicks, I looked back at the documentation and the commit log for v1.0.0. The first version was fully functional — it was definitely more than just a prototype. I remember feeling a bit nervous about the release, but not really apprehensive. Most of our meetings were about how we could make Stan faster and what we wanted to implement next.

Fast forward 3 years and Stan is still moving along. We now have a family of projects. We’ve split the math and automatic differentiation library into it’s own Stan Math Library, the Stan Library is being shaped into the language and the inference algorithms, there are interfaces for the command line, R, Python, Julia, Matlab, Stata, and additional supporting libraries like ShinyStan, stan-mode (syntax highlighting for emacs), and some more stuff coming down the line. We’re still talking about how to make Stan faster and what to implement next.

I’m going to start posting more regularly on the blog with Stan-related posts:
– general advice
– how to do easy stuff
– how to do expert stuff
– walk-throughs of models
– ideas that we want help on

If you have any suggestions, let me know.


  1. Tom says:

    It might be interesting to pull together some of the sutff on Stan diagnostics and how to interpret/use these. This is there in the user group / manual but there could be something useful in bringing it all together in one place.

  2. Richard McElreath says:

    I think one of Stan’s major selling points is how much easier it is to diagnose a bad NUTS chain than a bad Gibbs chain. Maybe a simple example of this point would be useful.

  3. nah says:

    Is Stan reasonable for an ELO simulation investigation as opposed to other programming options?

    I have a hobby interest in an online competitive game that uses an ELO derived matchmaking algorithm to match 2 teams of 5. I want to do a series of investigations into different known quirks of the system and how they affect matchmaking quality and rating distribution (inflation/deflation).

    Assumptions/data include: some old company disclosed information on different rating quantiles, an assumption that a true skill rating would be normally distributed, actual rating is 0 bounded approximately normal, some case study sort of series of data where people with a high and steady rating record w/ls after assuming a low rated account, and the various mechanics of the rating system +/- 25 rating for even games, varying up to +/-47 vs -/+3 if ratings prematch would indicate odds approximately 16:1.

    Quirks include: 1. How the 0 lower bound affects distribution of rating and match quality (defined as difference between summed actual skill values of both 5 player teams, for teams evenly matched in rating). 2. Effects of an analytical (non-ELO) quick calibration system (new players placed to within 500 hidden rating regardless of w/l based on undisclosed in-game metrics). 3. Effect on rating distribution / match quality of account selling, in which a very high rated player creates a new account, plays it until its rated and sells it to a lower rated buyer (who tend to lose games back to their old rating, often just buying another). Would look into the effects for various rates of prevalence, .1%, .5%, 1%, 5%. 4. Similar analysis on boosting, in which the higher player simply logs in with the lesser player’s credentials. 5. Analysis of regional effects — the game is played over the world and most matchmaking is regionally segregated for connectivity reasons, so ratings are not directly comparable for cultural and other reasons (population size). 6. Related is an intersection of 3., 4. and 5., since poorer asian regions are anecdotally the regions used by account boosters / sellers, while accounts are sold to more affluent regions, i.e. the EU or US. 7. Finally a look at a rule that soft limits hidden rating to 4500 during a forced pre-public rating period (approximately 120 games). To put that in some perspective, median rating was 2250, top 1% was over 4100, professional players are typically over 6000, top player ~8000. Anecdotally the desired ranged for sold accounts are 5000+.

    My guess is the most appropriate choice for this would be a general purpose language like python (I wrote a bit in f#), but I hear Stan is a complete language these days. I follow Andrew’s blog for non-Stan related reasons, just figured I’d ask after his pride and joy.

    • If you’re only running simulations and don’t need to fit, I’d just use R.

      As soon as you need to fit model parameters based on data, then you’re better off with Stan as it’s going to give you the flexibility in modeling language you need without forcing you to become an optimization, approximation, or MCMC expert.

Leave a Reply