Skip to content
Archive of posts filed under the Statistical computing category.

Some U.S. demographic data at zipcode level conveniently in R

Ari Lamstein writes: I chuckled when I read your recent “R Sucks” post. Some of the comments were a bit … heated … so I thought to send you an email instead. I agree with your point that some of the datasets in R are not particularly relevant. The way that I’ve addressed that is […]

Deep learning, model checking, AI, the no-homunculus principle, and the unitary nature of consciousness

Bayesian data analysis, as my colleagues and I have formulated it, has a human in the loop. Here’s how we put it on the very first page of our book: The process of Bayesian data analysis can be idealized by dividing it into the following three steps: 1. Setting up a full probability model—a joint […]

Only on the internet . . .

I had this bizarrely escalating email exchange. It started with this completely reasonable message: Professor, I was unable to run your code here: https://www.r-bloggers.com/downloading-option-chain-data-from-google-finance-in-r-an-update/ Besides a small typo [you have a 1 after names (options)], the code fails when you actually run the function. The error I get is a lexical error: Error: lexical error: […]

Kaggle Kernels

Anthony Goldbloom writes: In late August, Kaggle launched an open data platform where data scientists can share data sets. In the first few months, our members have shared over 300 data sets on topics ranging from election polls to EEG brainwave data. It’s only a few months old, but it’s already a rich repository for […]

Stan Webinar, Stan Classes, and StanCon

This post is by Eric. We have a number of Stan related events in the pipeline. On 22 Nov, Ben Goodrich and I will be holding a free webinar called Introduction to Bayesian Computation Using the rstanarm R Package. Here is the abstract: The goal of the rstanarm package is to make it easier to use Bayesian […]

Stan Case Studies: A good way to jump in to the language

Wanna learn Stan? Everybody’s talking bout it. Here’s a way to jump in: Stan Case Studies. Find one you like and try it out. P.S. I blogged this last month but it’s so great I’m blogging it again. For this post, the target audience is not already-users of Stan but new users.

Recently in the sister blog and elsewhere

Why it can be rational to vote (see also this by Robert Wiblin, “Why the hour you spend voting is the most socially impactful of all”) Be skeptical when polls show the presidential race swinging wildly The polls of the future will be reproducible and open source Testing the role of convergence in language acquisition, […]

Why I prefer 50% rather than 95% intervals

I prefer 50% to 95% intervals for 3 reasons: 1. Computational stability, 2. More intuitive evaluation (half the 50% intervals should contain the true value), 3. A sense that in aplications it’s best to get a sense of where the parameters and predicted values will be, not to attempt an unrealistic near-certainty. This came up […]

Michael Betancourt has made NUTS even more awesome and efficient!

In an beautiful new paper, Betancourt writes: The geometric foundations of Hamiltonian Monte Carlo implicitly identify the optimal choice of [tuning] parameters, especially the integration time. I then consider the practical consequences of these principles in both existing algorithms and a new implementation called Exhaustive Hamiltonian Monte Carlo [XMC] before demonstrating the utility of these […]

Some modeling and computational ideas to look into

Can we implement these in Stan? Marginally specified priors for non-parametric Bayesian estimation (by David Kessler, Peter Hoff, and David Dunson): Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of […]

“It’s not reproducible if it only runs on your laptop”: Jon Zelner’s tips for a reproducible workflow in R and Stan

Jon Zelner writes: Reproducibility is becoming more and more a part of the conversation when it comes to public health and social science research. . . . But comparatively little has been said about another dimension of the reproducibility crisis, which is the difficulty of re-generating already-complete analyses using the exact same input data. But […]

Yes, despite what you may have heard, you can easily fit hierarchical mixture models in Stan

There was some confusion on the Stan list that I wanted to clear up, having to do with fitting mixture models. Someone quoted this from John Kruschke’s book, Doing Bayesian Data Analysis: The lack of discrete parameters in Stan means that we cannot do model comparison as a hierarchical model with an indexical parameter at […]

Practical Bayesian model evaluation in Stan and rstanarm using leave-one-out cross-validation

Our (Aki, Andrew and Jonah) paper Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC was recently published in Statistics and Computing. In the paper we show why it’s better to use LOO instead of WAIC for model evaluation how to compute LOO quickly and reliably using the full posterior sample how Pareto smoothing importance […]

Mathematica, now with Stan

Vincent Picaud developed a Mathematica interface to Stan: MathematicaStan You can find everything you need to get started by following the link above. If you have questions, comments, or suggestions, please let us know through the Stan user’s group or the GitHub issue tracker. MathematicaStan interfaces to Stan through a CmdStan process. Stan programs are […]

Tenure Track Professor in Machine Learning, Aalto University, Finland

Posted by Aki. I promise that next time I’ll post something else than a job advertisement, but before that here’s another great opportunity to join Aalto Univeristy where I work, too. “We are looking for a professor to either further strengthen our strong research fields, with keywords including statistical machine learning, probabilistic modelling, Bayesian inference, […]

Stan case studies!

In the spirit of reproducible research, we (that is, Bob*) set up this beautiful page of Stan case studies. Check it out. * Bob here. Michael set the site up, I set this page up, and lots of people have contributed case studies and we’re always looking for more to publish.

“Crimes Against Data”: My talk at Ohio State University this Thurs; “Solving Statistics Problems Using Stan”: My talk at the University of Michigan this Fri

Crimes Against Data Statistics has been described as the science of uncertainty. But, paradoxically, statistical methods are often used to create a sense of certainty where none should exist. The social sciences have been rocked in recent years by highly publicized claims, published in top journals, that were reported as “statistically significant” but are implausible […]

Let’s play Twister, let’s play Risk

Alex Terenin, Dan Simpson, and David Draper write: Some months ago we shared with you an arxiv draft of our paper, Asynchronous Distributed Gibbs Sampling.​ Through comments we’ve received, for which we’re highly grateful, we came to understand that (a) our convergence proof was wrong, and (b) we actually have two algorithms, one exact and […]

Stan users group hits 2000 registrations

Of course, there are bound to be duplicate emails, dead emails, and people who picked up Stan, joined the list, and never came back. But still, that’s a lot of people who’ve expressed interest! It’s been an amazing ride that’s only going to get better as we learn more and continue to improve Stan’s speed […]

Several postdoc positions in probabilistic modeling and machine learning in Aalto, Helsinki

This post is by Aki In addition to the postdoc position I advertised recently, now Aalto University and University of Helsinki have 20 more open postdoc and research fellow positions. Many of the positions are in probabilistic models and machine learning. You could work with me (I’m also part of HIIT), but I can also […]