In connection with this workshop, I was asked to write a few paragraphs describing my perspective on “the current and near-term future state of the statistical sciences you are most familiar with.” Here’s what I wrote:
I think that, at any given time, the field of statistics has a core, but that core changes over time. There are different paradigmatic ways to solve problems.
100 or 150 years ago, the thing to do was to identify a phenomenon of interest with some probability distribution and then use the mathematics of that distribution to gain insight into the underlying process. Thus, for example, if certain data looked like they came from a normal distribution, one could surmise that the values in question arose by adding many small independent pieces. If the data looked like they came from a binomial distribution, that would imply independence and equal probabilities. Waiting times that followed an exponential distribution could be considered as coming from a memoryless process. And so forth. I remember in graduate school hearing rumors of Pearson’s system in which he characterized 12 classes of distributions, or something like that. It all sounds silly now, but it’s not so ridiculous, especially if you keep in mind that the name of the game is understanding the process, not just fitting the data. The point of picking the right distribution is to capture general features of the underlying system.
This approach, of coming up with low-dimensional theoretical distributions that capture important aspects of the data, is still being used in some areas of statistics (for example, some of the research on social networks and power laws), but overall I’d say that statisticians have moved toward a more descriptive approach based on conditional modeling–that is, regression, generalized linear models, nonparametric regression, and so forth. The idea is typically to identify the factors that influence some outcome of interest.
Meanwhile, other approaches to statistics go in and out of favor. It’s my impression that statistics in the twenty or thirty years following World War II was strongly influenced by the war experiences of various mathematicians who worked on problems of statistical inference and operations research. What we saw after the war was a framing of many statistical problems as optimization, and a general attitude that optimization (and, to a lesser extent, game theory) were the fundamental principles underlying all the data sciences.
More recently, starting in the 1960s and 1970s and continuing through today has been the idea that steadily increasing computing power should change how we collect and analyze data, to the extent that previous methods were limited by the constraint that they have closed-form expressions or be easy to compute. Even with fast computing, though, we still need as much mathematical understanding as we can get.