Skip to content
Archive of posts filed under the Statistical computing category.

Will Stanton hit 61 home runs this season?

[edit: Juho Kokkala corrected my homework. Thanks! I updated the post. Also see some further elaboration in my reply to Andrew’s comment. As Andrew likes to say …] So far, Giancarlo Stanton has hit 56 home runs in 555 at bats over 149 games. Miami has 10 games left to play. What’s the chance he’ll […]

Call for papers: Probabilistic Programming Languages, Semantics, and Systems (PPS 2018)

I’m on the program committee and they say they’re looking to broaden their horizons this year to include systems like Stan. The workshop is part of POPL, the big programming language theory conference. Here’s the official link PPS 2018 home page Call for extended abstracts (2 pages) The submissions are two-page extended abstracts and the […]

The fundamental abstractions underlying BUGS and Stan as probabilistic programming languages

Probabilistic programming languages I think of BUGS and Stan as probabilistic programming languages because their variables may be used to denote random variables, with function application doing the right thing in terms of propagating randomness (usually encoding uncertainty in a Bayesian setting). They are not probabilistic programming languages that provide an object language for inference; […]

Iterative importance sampling

Aki points us to some papers: Langevin Incremental Mixture Importance Sampling Parallel Adaptive Importance Sampling Iterative importance sampling algorithms for parameter estimation problems Next one is not iterative, but interesting in other way Black-box Importance Sampling Importance sampling is what you call it when you’d like to have draws of theta from some target distribution […]

Stan Weekly Roundup, 25 August 2017

This week, the entire Columbia portion of the Stan team is out of the office and we didn’t have an in-person/online meeting this Thursday. Mitzi and I are on vacation, and everyone else is either teaching, TA-ing, or attending the Stan course. Luckily for this report, there’s been some great activity out of the meeting […]

Bigshot statistician keeps publishing papers with errors; is there anything we can do to get him to stop???

OK, here’s a paper with a true theorem but then some false corollaries. First the theorem: The above is actually ok. It’s all true. But then a few pages later comes the false statement: This is just wrong, for two reasons. First, the relevant reference distribution is discrete uniform, not continuous uniform, so the normal […]

Wolfram on Golomb

I was checking out Stephen Wolfram’s blog and found this excellent obituary of Solomon Golomb, the mathematician who invented the maximum-length linear-feedback shift register sequence, characterized by Wolfram as “probably the single most-used mathematical algorithm idea in history.” But Golomb is probably more famous for inventing polyominoes. The whole thing’s a good read, and it […]

Look. At. The. Data. (Hollywood action movies example)

Kaiser Fung shares an amusing story of how you can be misled by analyzing data that you haven’t fully digested. Kaiser writes, “It pains me to think how many people have analyzed this dataset, and used these keywords to build models.”

Stan Weekly Roundup, 28 July 2017

Here’s the roundup for this past week. Michael Betancourt added case studies for methodology in both Python and R, based on the work he did getting the ML meetup together: RStan workflow PyStan workflow Michael Betancourt, along with Mitzi Morris, Sean Talts, and Jonah Gabry taught the women in ML workshop at Viacom in NYC […]

Animating a spinner using ggplot2 and ImageMagick

It’s Sunday, and I [Bob] am just sitting on the couch peacefully ggplotting to illustrate basic sample spaces using spinners (a trick I’m borrowing from Jim Albert’s book Curve Ball). There’s an underlying continuous outcome (i.e., where the spinner lands) and a quantization into a number of regions to produce a discrete outcome (e.g., “success” […]

Hey—here are some tools in R and Stan to designing more effective clinical trials! How cool is that?

In statistical work, design and data analysis are often considered separately. Sometimes we do all sorts of modeling and planning in the design stage, only to analyze data using simple comparisons. Other times, we design our studies casually, even thoughtlessly, and then try to salvage what we can using elaborate data analyses. It would be […]

Stan Weekly Roundup, 7 July 2017

Holiday weekend, schmoliday weekend. Ben Goodrich and Jonah Gabry shipped RStan 2.16.2 (their numbering is a little beyond base Stan, which is at 2.16.0). This reintroduces error reporting that got lost in the 2.15 refactor, so please upgrade if you want to debug your Stan programs! Joe Haupt translated the JAGS examples in the second […]

What is a pull request?

Bob explains: A pull request (PR) is the minimal publishable unit of open-source development. It’s a proposed change to the code base that we can then review. If you want to see how the sausage is made, follow this link. If you click on “files changed”, you’ll see what Sean is proposing doing with the […]

Stan Weekly Roundup, 30 June 2017

Here’s some things that have been going on with Stan since the last week’s roundup StanĀ® and the logo were granted a U.S. Trademark Registration No. 5,222,891 and a U.S. Serial Number: 87,237,369, respectively. Hard to feel special when there were millions of products ahead of you. Trademarked names are case insensitive and they required […]

“Developers Who Use Spaces Make More Money Than Those Who Use Tabs”

Rudy Malka writes: I think you’ll enjoy this nice piece of pop regression by David Robinson: developers who use spaces make more money than those who use tabs. I’d like to know your opinion about it. At the above link, Robinson discusses a survey that allows him to compare salaries of software developers who use […]

SPEED: Parallelizing Stan using the Message Passing Interface (MPI)

Sebastian Weber writes: Bayesian inference has to overcome tough computational challenges and thanks to Stan we now have a scalable MCMC sampler available. For a Stan model running NUTS, the computational cost is dominated by gradient calculations of the model log-density as a function of the parameters. While NUTS is scalable to huge parameter spaces, […]

Workshop on reproducibility in machine learning

Alex Lamb writes: My colleagues and I are organizing a workshop on reproducibility and replication for the International Conference on Machine Learning (ICML). I’ve read some of your blog posts on the replication crisis in the social sciences and it seems like this workshop might be something that you’d be interested in. We have three […]

Using external C++ functions with PyStan & radial velocity exoplanets

Dan Foreman-Mackey writes: I [Mackey] demonstrate how to use a custom C++ function in a Stan model using the Python interface PyStan. This was previously only possible using the R interface RStan (see an example here) so I hacked PyStan to make this possible in Python as well. . . . I have some existing […]

Another serious error in my published work!

Uh oh, I’m starting to feel like that pizzagate guy . . . Here’s the background. When I talk about my serious published errors, I talk about my false theorem, I talk about my empirical analysis that was invalidated by miscoded data, I talk my election maps whose flaws were pointed out by an angry […]

Hello, world! Stan, PyMC3, and Edward

Being a computer scientist, I like to see “Hello, world!” examples of programming languages. Here, I’m going to run down how Stan, PyMC3 and Edward tackle a simple linear regression problem with a couple of predictors. No, I’m not going to take sides—I’m on a fact-finding mission. We (the Stan development team) have been trying […]