One of the fastest generalized linear model implementations

Aleks points me to this new article by the Stanford triplets:

We [Friedman, Hastie, and Tibshirani] develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include L1 (the lasso), L2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

I wonder if these methods could be applied to our problems, things like this. Here we’re fitting a hierarchical model–yes, we have normal priors but I don’t think it’s quite right to call it “ridge regression.” The algorithm of Friedman, Hastie, and Tibshirani might still be adapted to these more complex models. (I’m assuming that their algorithm is already way faster than lmer or bayesglm.)

P.S. Oooh, they have ugly tables. I’ll know that Stanford Statistics has finally taken over the world when they fully switch from tables to graphs.

P.P.S. I’m glad to see these top guys publishing in the Journal of Statistical Software. We’ve been publishing there lately–sometimes it seems like just the right place–but we worried that nobody was reading it. It’s good to see this journal as a place for new research.

5 thoughts on “One of the fastest generalized linear model implementations

  1. Efficiency is not the same thing as scalability. Efficiency is about speed and memory use (i.e. time and space), and scalability is about being able to solve large problems. It's often a tradeoff between the two approaches. For example, moving data from memory to secondary storage usually increases scalability and decreases efficiency.

    The examples for which Friedman et al. report timings are all what I'd consider tiny. Other packages, like Madigan and Lewis's BMR, Langford et al.'s Vowpal Wabbit, and Bottou's SGD, all consider much larger problems. It'd be interesting to see how well Friedman et al.'s program scales.

    It's also incredibly tricky to calibrate convergence requirements across systems, say, decimal places of accuracy. Often you achieve good predictive performance quickly with a decimal place or two of accuracy using stochastic gradient methods. But stochastic gradient can take a long long time to get to many decimal places of accuracy.

    It's also tricky to evaluate the algorithm rather than the programming language, coding skill, and platform. (Did anyone else notice this thing was written in Fortran?)

    What do you make of the elastic net priors? Have you (Andrew or Aleks) evaluated them like you did with the L0, L1 and Cauchy priors?

    Ironically, given your criticism of Friedman et al.'s tables, I love their graphs of the regularization paths, which I first saw in their machine learning text.

    The whole regularization path idea reminded me of Andrew's suggestion to fit ever more complex models using the previous model's parameters as initial values in MCMC.

  2. Andrew

    As for nobody reading JSS: it has 15,000 hits per day, 1,000-3,000 hits per article over time, and, at least by some indicators, higher impacts than JCGS and CSDA. You can check the Scopus impact factors on the homepage at http://www.jstatsoft.org. So no worry.

    — Jan

  3. The penalty term that corresponds to t-distribution prior is of the type log(1+(x-x')^2), so if their stuff can be adapted to this loss function, all will be well.

  4. Jan: Cool.

    Aleks: I wasn't really thinking of our weakly informative prior for regression coefficients. I was thinking of hierarchical models, where the hyperparameters are estimated from the data. Such models are estimated iteratively, and it seems like it might be possible to fold such iterations in to an existing iterative algorithm such as this one.

Comments are closed.