Skip to content
Archive of posts filed under the Statistical computing category.

Google Translate for code, and an R help-list bot

What we did in our Stan meeting yesterday: Some discussion of revision of the Nuts paper, some conversations about parameterizations of categorical-data models, plans for the R interface, blah blah blah. But also, I had two exciting new ideas! Google Translate for code Wouldn’t it be great if Google Translate could work on computer languages? [...]

Understanding simulations in terms of predictive inference?

David Hogg writes: My (now deceased) collaborator and guru in all things inference, Sam Roweis, used to emphasize to me that we should evaluate models in the data space — not the parameter space — because models are always effectively “effective” and not really, fundamentally true. Or, in other words, models should be compared in [...]

Hierarchical/multilevel modeling with “big data”

Dean Eckles writes: I make extensive use of random effects models in my academic and industry research, as they are very often appropriate. However, with very large data sets, I am not sure what to do. Say I have thousands of levels of a grouping factor, and the number of observations totals in the billions. [...]

Factual – a new place to find data

Factual collects data on a variety of topics, organizes them, and allows easy access. If you ever wanted to do a histogram of calorie content in Starbucks coffees or plot warnings with a live feed of earthquake data – your life should be a bit simpler now. Also see DataMarket, InfoChimps, and a few older [...]

Web equation

Aleks sends along this app which, while cute, is not quite “killer” for me. I find it more difficult to write the equation using the trackpad than to simply type it in using Latex! But I suppose it could be useful to beginners who want their papers to look more like science.

Lessons learned from a recent R package submission

R has zillions of packages, and people are submitting new ones each day. The volunteers who keep R going are doing an incredibly useful service to the profession, and they’re busy. A colleague sends in some suugestions based on a recent experience with a package update: 1. Always use the R dev version to write [...]

Stan: A (Bayesian) Directed Graphical Model Compiler

Here’s Bob’s talk from the NYC machine learning meetup. And here’s Stan himself:

Bob on Stan

Thurs 19 Jan 7pm at the NYC Machine Learning meetup. Stan‘s entirely publicly funded and open-source and it has no secrets. Ask us about it and we’ll tell you everything you might want to know. P.S. And here‘s the talk.

Path sampling for models of varying dimension

Somebody asks:

Ripley on model selection, and some links on exploratory model analysis

This is really fun. I love how Ripley thinks, with just about every concept considered in broad generality while being connected to real-data examples. He’s a great statistical storyteller as well. . . . and Wickham on exploratory model analysis I came across Ripley’s slides in a reference from Hadley Wickham’s article on exploratory model [...]

Towards a Theory of Trust in Networks of Humans and Computers

Hey, this looks cool: Towards a Theory of Trust in Networks of Humans and Computers Virgil Gligor Carnegie Mellon University We argue that a general theory of trust in networks of humans and computers must be build on both a theory of behavioral trust and a theory of computational trust. This argument is motivated by [...]

Martyn Plummer’s Secret JAGS Blog

Martyn Plummer, the creator of the open-source, C++, graphical-model compiler JAGS (aka “Just Another Gibbs Sampler”), runs a forum on the JAGS site that has a very similar feel to the mail-bag posts on this blog. Martyn answers general statistical computing questions (e.g., why slice sampling rather than Metropolis-Hastings?) and general modeling (e.g., why won’t [...]

Stan uses Nuts!

We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project. Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the [...]

Validation of Software for Bayesian Models Using Posterior Quantiles

I love this stuff: This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, [...]

Tempering and modes

Gustavo writes: Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As [...]

Wickham R short course

Hadley writes: I [Hadley] am going to be teaching an R development master class in New York City on Dec 12-13. The basic idea of the class is to help you write better code, focused on the mantra of “do not repeat yourself”. In day one you will learn powerful new tools of abstraction, allowing [...]

MacKay update: where 12 comes from

In reply to my question, David MacKay writes: You said that can imagine rounding up 9 to 10 – which would be elegant if we worked in base 10. But in the UK we haven’t switched to base 10 yet, we still work in dozens and grosses. (One gross = 12^2 = 144.) So I [...]

David MacKay sez . . . 12??

I’ve recently been reading David MacKay’s 2003 book, Information Theory, Inference, and Learning Algorithms. It’s great background for my Bayesian computation class because he has lots of pictures and detailed discussions of the algorithms. (Regular readers of this blog will not be surprised to hear that I hate all the Occam-factor stuff that MacKay talks [...]

Bayesian inference for the parameter of a uniform distribution

Subhash Lele writes: I was wondering if you might know some good references to Bayesian treatment of parameter estimation for U(0,b) type distributions. I am looking for cases where the parameter is on the boundary. I would appreciate any help and advice you could provide. I am, in particular, looking for an MCMC (preferably in [...]

Web-friendly visualizations in R

Aleks points me to this new tool from Wojciech Gryc. Right now I save my graphs as pdfs or pngs and then upload them to put them on the web. I expect I’ll still be doing this for awhile—I like having full control of what my graphs look like—but Gryc’s default plots might be useful [...]

An interweaving-transformation strategy for boosting MCMC efficiency

Yaming Yu and Xiao-Li Meng write in with a cool new idea for improving the efficiency of Gibbs and Metropolis in multilevel models: For a broad class of multilevel models, there exist two well-known competing parameterizations, the centered parameterization (CP) and the non-centered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to [...]

Why it doesn’t make sense to chew people out for not reading the help page

Karl Broman writes: Barry Rowlingson gave an interesting talk at UseR 2011, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange. I haven’t used R-help recently but I do occasionally send people there. Just to see what was going on [...]

DBQQ rounding for labeling charts and communicating tolerances

This is a mini research note, not deserving of a paper, but perhaps useful to others. It reinvents what has already appeared on this blog. Let’s say we have a line chart with numbers between 152.134 and 210.823, with the mean of 183.463. How should we label the chart with about 3 tics? Perhaps 152.132, [...]

Hamiltonian Monte Carlo stories

Tomas Iesmantas had asked me for advice on a regression problem with 50 parameters, and I’d recommended Hamiltonian Monte Carlo. A few weeks later he reported back:

R and Google Visualization

Eric Tassone writes: Here’s something that may be of interest and useful to your readers, and which I [Tassone] am just now checking out myself. It links R and the Google Visualization API/Google Chart Tools to make Motion Charts (as used in the well known Hans Rosling TED talk) easier to create directly in R. [...]