Over the years I’ve written a dozen or so journal articles that have appeared with discussions, and I’ve participated in many published discussions of others’ articles as well. I get a lot out of these article-discussion-rejoinder packages, in all three of my roles as reader, writer, and discussant. Part 1: The story of an unsuccessful [...]
The Folk Theorem of Statistical Computing
From an email I received the other day: Things are going much better now — it’s interesting, it feels like with both of my models, parameters are slow to converge or get “stuck” and have trouble mixing when the model is somehow misspecified. See here for a statement of the folk theorem.
Continued fractions!!
Upon reading this note by John Cook on continued fractions, I wrote: If you like continued fractions, I recommend you read the relevant parts of the classic Numerical Methods That Work. The details are probably obsolete but it’s fun reading (at least, if you think that sort of thing is fun to read). I then [...]
It’s binless! A program for computing normalizing functions
Zhiqiang Tan writes: I have created an R package to implement the full likelihood method in Kong et al. (2003). The method can be seen as a binless extension of so-called Weighted Histogram Analysis Method (UWHAM) widely used in physics and chemistry. The method has also been introduced to the physics literature and called the [...]
Excel-bashing
In response to the latest controversy, a statistics professor writes: It’s somewhat surprising to see Very Serious Researchers (apologies to Paul Krugman) using Excel. Some years ago, I was consulting on a trademark infringement case and was trying (unsuccessfully) to replicate another expert’s regression analysis. It wasn’t until I had the brainstorm to use Excel [...]
NUTS discussed on Xi’an’s Og
Xi’an’s Og (aka Christian Robert’s blog) is featuring a very nice presentation of NUTS by Marco Banterle, with discussion and some suggestions. I’m not even sure how they found Michael Betancourt’s paper on geometric NUTS — I don’t see it on the arXiv yet, or I’d provide a link.
Data problems, coding errors…what can be done?
This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that [...]
Stan 1.3.0 and RStan 1.3.0 Ready for Action
The Stan Development Team is happy to announce that Stan 1.3.0 and RStan 1.3.0 are available for download. Follow the links on: Stan home page: http://mc-stan.org/ Please let us know if you have problems updating. Here’s the full set of release notes. v1.3.0 (12 April 2013) ====================================================================== Enhancements ———————————- Modeling Language * forward sampling (random [...]
Stan at Google this Thurs and at Berkeley this Fri noon
Michael Betancourt will be speaking at Google and at the University of California, Berkeley. The Google talk is closed to outsiders (but if you work at Google, you should go!); the Berkeley talk is open to all: Friday March 22, 12:10 pm, Evans Hall 1011. Title of talk: Stan: Practical Bayesian Inference with Hamiltonian Monte [...]
How do I make my graphs?
Someone who wishes to remain anonymous writes:
Cool GSS training video! And cumulative file 1972-2012!
Felipe Osorio made the above video to help people use the General Social Survey and R to answer research questions in social science. Go for it! Meanwhile, Tom Smith reports: The initial release of the General Social Survey (GSS), cumulative file for 1972-2012 is now on our website. Codebooks and copies of questionnaires will be [...]
Stan 1.2.0 and RStan 1.2.0
Stan 1.2.0 and RStan 1.2.0 are now available for download. See: http://mc-stan.org/ Here are the highlights. Full Mass Matrix Estimation during Warmup Yuanjun Gao, a first-year grad student here at Columbia (!), built a regularized mass-matrix estimator. This helps for posteriors with high correlation among parameters and varying scales. We’re still testing this ourselves, so [...]
Stan in L.A. this Wed 3:30pm
Michael Betancourt will be speaking at UCLA: The location for refreshment is in room 51-254 CHS at 3:00 PM. The place for the seminar is at CHS 33-105A at 3:30pm – 4:30pm, Wed 6 Mar. ["CHS" stands for Center for Health Sciences, the building of the UCLA schools of medicine and public health. Here's a [...]
PyStan!
Stan is written in C++ and can be run from the command line and from R. We’d like for Python users to be able to run Stan as well. If anyone is interested in doing this, please let us know and we’d be happy to work with you on it. Stan, like Python, is completely [...]
“Is machine learning a subset of statistics?”
Following up on our previous post, Andrew Wilson writes: I agree we are in a really exciting time for statistics and machine learning. There has been a lot of talk lately comparing machine learning with statistics. I am curious whether you think there are many fundamental differences between the fields, or just superficial differences — [...]
An AI can build and try out statistical models using an open-ended generative grammar
David Duvenaud writes: I’ve been following your recent discussions about how an AI could do statistics [see also here]. I was especially excited about your suggestion for new statistical methods using “a language-like approach to recursively creating new models from a specified list of distributions and transformations, and an automatic approach to checking model fit.” [...]
Rcpp class in Sat 9 Mar in NYC
Join Dirk Eddelbuettel for six hours of detailed and hands-on instructions and discussions around Rcpp, RInside, RcppArmadillo, RcppGSL and other packages . . . Rcpp has become the most widely-used language extension for R. Currently deployed by 103 CRAN packages and a further 10 BioConductor packages, it permits users and developers to pass “whole R [...]
F-f-f-fake data
Tiago Fragoso writes: Suppose I fit a two stage regression model Y = a + bx + e a = cw + d + e1 I could fit it all in one step by using MCMC for example (my model is more complicated than that, so I’ll have to do it by MCMC). However, I [...]
iPython Notebook
Burak Bayramli writes: I wanted to inform you on iPython Notebook technology – allowing markup, Python code to reside in one document. Someone ported one of your examples from ARM. iPynb file is actually a live document, can be downloaded and reran locally, hence change of code on document means change of images, results. Graphs [...]
The new Stan 1.1.1, featuring Gaussian processes!
We just released Stan 1.1.1 and RStan 1.1.1 As usual, you can find download and install instructions at: http://mc-stan.org/ This is a patch release and is fully backward compatible with Stan and RStan 1.1.0. The main thing you should notice is that the multivariate models should be much faster and all the bugs reported for [...]
Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?
Justin Kinney writes: Since your blog has discussed the “maximal information coefficient” (MIC) of Reshef et al., I figured you might want to see the critique that Gurinder Atwal and I have posted. In short, Reshef et al.’s central claim that MIC is “equitable” is incorrect. We [Kinney and Atwal] offer mathematical proof that the [...]
Class on computational social science this semester, Fridays, 1:00-3:40pm
Sharad Goel, Jake Hofman, and Sergei Vassilvitskii are teaching this awesome class on computational social science this semester in the applied math department at Columbia. Here’s the course info. You should take this course. These guys are amazing.
R package for Bayes factors
Richard Morey writes: You and your blog readers may be interested to know that a we’ve released a major new version of the BayesFactor package to CRAN. The package computes Bayes factors for linear mixed models and regression models. Of course, I’m aware you don’t like point-null model comparisons, but the package does more than [...]
Software is as software does
We had a recent discussion about statistics packages where people talked about the structure and capabilities of different computer languages. One thing I wanted to add to this discussion is some sociology. To me, a statistics package is not just its code, it’s also its community, it’s what people do with it. R, for example, [...]
The statistics software signal
Tyler Cowen links to a post by Sean Taylor, who writes the following about users of R: You are willing to invest in learning something difficult. You do not care about aesthetics, only availability of packages and getting results quickly. To me, R is easy and Sas is difficult. I once worked with some students [...]