Skip to content

Linear regression is not dead, and please don’t call it OLS

Lee Sigelman writes,

In the latest issue of The Political Methodologist, James S. Krueger and Michael S. Lewis-Beck examine the current standing of the time-honored but oft-dismissed-as-passe ordinary least squares regression model in political science research. . . . Krueger and Lewis-Beck report that . . . The OLS regression model accounted for 31% of the statistical methods employed in these articles. . . . “Less sophisticated” statistical methods — those that would ordinarily be covered before OLS in a methods course — accounted for 21% of the entries. . . . Just one in six or so of the articles that reported an OLS-based analysis went on to report a “more sophisticated” one as well. . . . OLS is not dead. On the contrary, it remains the principal multivariate technique in use by researchers publishing in our best journals. Scholars should not despair that possession of quantitative skills at an OLS level (or less) bars them from publication in these top outlets.

I have a few thoughts on this:

1. I don’t like the term OLS (“ordinary least squares”). I prefer the term “linear regression” or “linear model.” Least squares is an optimization problem; what’s important (in the vast majority of cases I’ve seen) is the model. For example, if you still do least squares but you change the functional form of the model so it’s no longer linear, that’s a big deal. But if you keep the linearity and change to a different optimization problem (for example, least absolute deviation), that generally doesn’t matter much. It might change the estimate, and that’s fine, but it’s not changing the key part of the model.

2. I like simple methods. Gary and I once wrote a paper that had no formulas, no models, only graphs. It had 10 graphs, many made of multiple subgraphs. (Well, we did have one graph that was based on some fitted logistic regressions–an early implementation of the secret weapon–but the other 9 didn’t use models at all.) And, contrary to Cosma’s comment on John’s entry, our findings were right, not just published. The purpose of the graphical approach was not simply to convey results to the masses, and certainly not because it was all that we knew how to do. It just seemed like the best way to do this particular research. Since then, we’ve returned to some of these ideas using models, but I think we learned a huge amount from these graphs (along with others that didn’t make it into the paper).

3. Sometimes simple methods can be justified by statistical theory. I’m thinking here of our approach of splitting a predictor at the upper quarter or third and the lower quarter or third. (Although, see the debate here.)

4. Other times, complex models can be more robust than simple models and easier to use in practice. (Here I’m thinking of bayesglm.)

5. Sometimes it helps to run complicated models first, then when you understand your data well, you can carefully back out a simple analysis that tells the story well. Conversely, after fitting a complicated model, you can sometimes make killer graphs.


  1. Cosma says:

    I confess I was made somewhat cranky by the spin at the end – to paraphrase, don't worry if you don't really understand what you're doing, you can still get away with cookbook regression. Obviously (?) I prefer the simplest and most intuitive method which will actually answer the question; I've just read (and refereed) too many papers where people used linear regression because It Is The Thing To Use, whether substantive assumptions (e.g., linearity!) made sense or not…

  2. Lee Sigelman says:

    Just to be clear: The spin at the end was that of the authors of the analysis, not of the blogger (i.e., me). I expect that on this matter (though apparently not with regard to the way academicians dress), I'm on the same page as Cosma.

  3. LemmusLemmus says:

    I don't think calling it "OLS regression" is a big problem. People are going to assume it is a linear one unless you say otherwise, no? In fact, I don't remember ever having seen a nonlinear OLS regression.

  4. David Kane says:

    Let me steal and second an observation from one of my co-authors, a current statistics Ph.D. student who has worked at both Yahoo and Google:

    I am never seen a real applied example in which fancy machine learning stuff (neural networks, random forests, support vector machines, genetic algorithms and so on) did meaningfully better than a well-designed linear model. Get the variables correct (often hard to do!) and a linear model is all you need.

  5. Andrew says:


    I didn't literally mean that the term "OLS" would mislead people. I just feel that it emphasizes the wrong aspect of the problem, and I could see how this could confuse people, especially students.

    For example, how do you generalize "OLS"? Add weights, change "least squares" to "least absolute deviation," etc. These are not completely dead ends but they're not the most important issues. Contrariwise, how do you generalize "linear regression"? Obviously, to "nonlinear regression," which I think is a more helpful way to generalize.


    I'm sympathetic to this point, and it's been true most of the time–for example, in my 1990 paper with King, we used a complicated mixture model for votes for congressional candidates, but then by the time we got to our 1994 paper, we'd added incumbency as a predictor and we were able to get rid of the mixture model entirely.

    But I wouldn't go as far as your coauthor. I don't know enough about machine learning to comment, but I've certainly found the need for fancy nonlinear models at times, for example in my 1996 paper with Bois and Jiang on toxicology. And I have to assume that those machine learning people know what they're doing too! Finally, Jennifer certainly claims that George and McCulloch's BART works better than the standard procedures for causal inference in observational studies with many predictors.

  6. kio says:

    Scatter plots and linear regression are a fundament of hard sciences. Assessment of independent acts of measurements leards directly to plots and fitting lines. I would say that results of linear regression provide our basic understanding of the physical world pattern. In fact – all fudamental laws are linear regressions between measrued and predicted values.

    For the processes evolving over time, however, linear regression is not so helpful. There are two opportunities:
    1. there is no true link between studied variables. Then linear regression may lead to erroneous conclusion on the presence of some link. Cointegration helps, but no saves here.

    2. There is a long-term equilibrium link between these variables. Then regression is inferior to integral approach. Cumulative values, where uncorrelated noise cancels itself out, provide much higher resolution for models since are charaterized by higher S/N ratio.
    COintegration test are very dangerous when applied to real (physics-like) links because differentiation severely diminishes S/N ratio.

  7. ti says:

    i've used a negative binomial regression, is it still useful to do a scatter plot and a regression line. or does the graph not fit to the binomial regression?