Skip to content
Archive of posts filed under the Statistical computing category.

Bigshot statistician keeps publishing papers with errors; is there anything we can do to get him to stop???

OK, here’s a paper with a true theorem but then some false corollaries. First the theorem: The above is actually ok. It’s all true. But then a few pages later comes the false statement: This is just wrong, for two reasons. First, the relevant reference distribution is discrete uniform, not continuous uniform, so the normal […]

Wolfram on Golomb

I was checking out Stephen Wolfram’s blog and found this excellent obituary of Solomon Golomb, the mathematician who invented the maximum-length linear-feedback shift register sequence, characterized by Wolfram as “probably the single most-used mathematical algorithm idea in history.” But Golomb is probably more famous for inventing polyominoes. The whole thing’s a good read, and it […]

Look. At. The. Data. (Hollywood action movies example)

Kaiser Fung shares an amusing story of how you can be misled by analyzing data that you haven’t fully digested. Kaiser writes, “It pains me to think how many people have analyzed this dataset, and used these keywords to build models.”

Stan Weekly Roundup, 28 July 2017

Here’s the roundup for this past week. Michael Betancourt added case studies for methodology in both Python and R, based on the work he did getting the ML meetup together: RStan workflow PyStan workflow Michael Betancourt, along with Mitzi Morris, Sean Talts, and Jonah Gabry taught the women in ML workshop at Viacom in NYC […]

Animating a spinner using ggplot2 and ImageMagick

It’s Sunday, and I [Bob] am just sitting on the couch peacefully ggplotting to illustrate basic sample spaces using spinners (a trick I’m borrowing from Jim Albert’s book Curve Ball). There’s an underlying continuous outcome (i.e., where the spinner lands) and a quantization into a number of regions to produce a discrete outcome (e.g., “success” […]

Hey—here are some tools in R and Stan to designing more effective clinical trials! How cool is that?

In statistical work, design and data analysis are often considered separately. Sometimes we do all sorts of modeling and planning in the design stage, only to analyze data using simple comparisons. Other times, we design our studies casually, even thoughtlessly, and then try to salvage what we can using elaborate data analyses. It would be […]

Stan Weekly Roundup, 7 July 2017

Holiday weekend, schmoliday weekend. Ben Goodrich and Jonah Gabry shipped RStan 2.16.2 (their numbering is a little beyond base Stan, which is at 2.16.0). This reintroduces error reporting that got lost in the 2.15 refactor, so please upgrade if you want to debug your Stan programs! Joe Haupt translated the JAGS examples in the second […]

What is a pull request?

Bob explains: A pull request (PR) is the minimal publishable unit of open-source development. It’s a proposed change to the code base that we can then review. If you want to see how the sausage is made, follow this link. If you click on “files changed”, you’ll see what Sean is proposing doing with the […]

Stan Weekly Roundup, 30 June 2017

Here’s some things that have been going on with Stan since the last week’s roundup StanĀ® and the logo were granted a U.S. Trademark Registration No. 5,222,891 and a U.S. Serial Number: 87,237,369, respectively. Hard to feel special when there were millions of products ahead of you. Trademarked names are case insensitive and they required […]

“Developers Who Use Spaces Make More Money Than Those Who Use Tabs”

Rudy Malka writes: I think you’ll enjoy this nice piece of pop regression by David Robinson: developers who use spaces make more money than those who use tabs. I’d like to know your opinion about it. At the above link, Robinson discusses a survey that allows him to compare salaries of software developers who use […]

SPEED: Parallelizing Stan using the Message Passing Interface (MPI)

Sebastian Weber writes: Bayesian inference has to overcome tough computational challenges and thanks to Stan we now have a scalable MCMC sampler available. For a Stan model running NUTS, the computational cost is dominated by gradient calculations of the model log-density as a function of the parameters. While NUTS is scalable to huge parameter spaces, […]

Workshop on reproducibility in machine learning

Alex Lamb writes: My colleagues and I are organizing a workshop on reproducibility and replication for the International Conference on Machine Learning (ICML). I’ve read some of your blog posts on the replication crisis in the social sciences and it seems like this workshop might be something that you’d be interested in. We have three […]

Using external C++ functions with PyStan & radial velocity exoplanets

Dan Foreman-Mackey writes: I [Mackey] demonstrate how to use a custom C++ function in a Stan model using the Python interface PyStan. This was previously only possible using the R interface RStan (see an example here) so I hacked PyStan to make this possible in Python as well. . . . I have some existing […]

Another serious error in my published work!

Uh oh, I’m starting to feel like that pizzagate guy . . . Here’s the background. When I talk about my serious published errors, I talk about my false theorem, I talk about my empirical analysis that was invalidated by miscoded data, I talk my election maps whose flaws were pointed out by an angry […]

Hello, world! Stan, PyMC3, and Edward

Being a computer scientist, I like to see “Hello, world!” examples of programming languages. Here, I’m going to run down how Stan, PyMC3 and Edward tackle a simple linear regression problem with a couple of predictors. No, I’m not going to take sides—I’m on a fact-finding mission. We (the Stan development team) have been trying […]

Theoretical Statistics is the Theory of Applied Statistics: How to Think About What We Do

Above is my talk at the 2017 New York R conference. Look, no slides! The talk went well. I think the video would be more appealing to listen to if they’d mixed in more of the crowd noise. Then you’d hear people laughing at all the right spots. P.S. Here’s my 2016 NYR talk, and […]

Visualizing your fitted Stan model using ShinyStan without interfering with your Rstudio session

ShinyStan is great, but I don’t always use it because when you call it from R, it freezes up your R session until you close the ShinyStan window. But it turns out that it doesn’t have to be that way. Imad explains: You can open up a new session via the RStudio menu bar (Session […]

Design top down, Code bottom up

Top-down design means designing from the client application programmer interface (API) down to the code. The API lays out a precise functional specification, which says what the code will do, not how it will do it. Coding bottom up means coding the lowest-level foundations first, testing them, then continuing to build. Sometimes this requires dropping […]

A continuous hinge function for statistical modeling

This comes up sometimes in my applied work: I want a continuous “hinge function,” something like the red curve above, connecting two straight lines in a smooth way. Why not include the sharp corner (in this case, the function y=-0.5*x if x<0 or y=0.2*x if x>0)? Two reasons. First, computation: Hamiltonian Monte Carlo can trip […]

Using Stan for week-by-week updating of estimated soccer team abilites

Milad Kharratzadeh shares this analysis of the English Premier League during last year’s famous season. He fit a Bayesian model using Stan, and the R markdown file is here. The analysis has three interesting features: 1. Team ability is allowed to continuously vary throughout the season; thus, once the season is over, you can see […]