Sociologist Fabio Rojas reports on “a conversation I [Rojas] have had a few times with statisticians”:
Rojas: “What does your research tell us about a sample of, say, a few hundred cases?”
Statistician: “That’s not important. My result works as n–> 00.”
Rojas: “Sure, that’s a fine mathematical result, but I have to estimate the model with, like, totally finite data. I need inference, not limits. Maybe the estimate doesn’t work out so well for small n.”
Statistician: “Sure, but if you have a few million cases, it’ll work in the limit.”
Rojas: “Whoa. Have you ever collected, like, real world network data? A million cases is hard to get.”
The conversation continues in this frustrating vein. Rojas writes:
This illustrates a fundamental issue in statistics (and other sciences). One you formalize a model and work mathematically, you are tempted to focus on what is mathematically interesting instead of the underlying problem motivating the science. . . .
We have the same issue in statistics. “Statistics” can mean “the mathematics of distributions and other functions arising in statistical models.” Or it can mean the traditional problems of statistics like inference, measurement, model estimation, sampling, data collection/management, forecasting, and description. The problem for a guy like me (a social scientist with real data) is that the label “statistician” often denotes someone who is actually a mathematician who happens to be interested in distributions. . . . What I really want is a nuts and bolts person to help me solve problems.
My first reaction—actually, my main reaction—is that Rojas hangs out with the wrong sort of statistician. Following the links, I see that Rojas works at Indiana University, which features a large statistics department. I suspect he had the misfortune to encounter “a mathematician who happens to be interested in distributions” and he didn’t realize he could shop around among the many statisticians in that department who work on applied social research.
On the other hand, it’s a bad sign that Rojas reports having this conversation multiple times. I thought that statisticians nowadays know they’re supposed to be helpful on real problems. That “n -> infinity” thing seems so old-fashioned! I’d like to believe that Rojas was just having some bad luck, but maybe there’s more of this bad stuff going on than I realized. Or maybe it was just a communication problem?
It’s hard for me to imagine a statistician in 2012 telling a sociologist, “if you have a few million cases, it’ll work in the limit,” except as a joke, as an ironic comment on the limitations of some of our theory. But perhaps that just reflects the poverty of my imagination.