Felipe Nunes writes:

I have many friends working with data that they claim to be considered as a ‘population’. For example, the universe of bills presented in a Congress, the roll call votes of all deputies in a legislature, a survey with all deputies in a country, the outcomes of an election, or the set of electoral institutions around the world. Because of the nature of these data, we do not know how to interpret the p-value. I have seen many arguments been made, but I have never seen a formal response to the question. So I don’t know what to say. The most common arguments among the community of young researchers in Brazil are: (1) don’t interpret p-value when you have population, but don’t infer anything either; (2) interpret the p-value because of error measurement which is also present, (3) there is no such a thing as a population, so always look at p-values, (4) don’t worry about p-value, interpret the coefficients substantively, and (5) if you are frequentist you interpret p-value, if you are bayesian you don’t.

If you have a paper or any other reference that can help with this discussion, please refer to me as well.

Here’s my reply.

I run into this problem in my professional life all the time. The typical scenario is that we know x_1 for the entire population 1 and x_2 for the entire population 2. The question is then asked “is x_1 different from x_2”?

If the analyst has no statistical training they will just compare the two numbers and answer the question. If they want to know whether they are “significantly” different, they will determine if the difference x_1-x_2 would be of practical importance to the people whose lives depend on it.

Statistically trained analysts however will inevitably run some comparison of means significance test and report whether the difference was statistically significant. I have no idea what they think they’re doing with this or what they think the answer means.

That’s not the end of the story though. It turns out there is a kind of small measurement error to the numbers x_1, x_2. The kind of error is very different than usually encountered and requires a careful, new analysis. But if you take the time to do the analysis, you can then answer the statistical question “could the difference between x_1 and x_2 be reasonable due to this unusual measurement error?”

The funny thing is that non-statistically trained analysts get this perfectly, but it’s a hard sell to those with a statistical background (MS/PHD level in a field that requires significant statistical coursework).

I suspect it is a failure or at least insurmountable opportunity of statistical education.

When I was asked this question in class once, I replied that the interest was probably not about the (as Peirce called it the dead) past population (recorded activities) but rather the population as it presently would act or in the future where there are always uncertainties. But then I had to admit that getting at that uncertainty would be very difficult and beyond the course. Who knows what they now think (or would do in their work to avoid being criticised)!

Unfortunately, the next mountain is that the standard error is only a well understood (or widely understandable) measure of uncertainty when there is randomization but this does not stop it from being misunderstood in many epidemiology studies and elsewhere. Often a sophisticated multivariate analysis of some sort is complicit in encouraging that misunderstanding by confusing attempts to remove some confounding with having (almost completely) removed all of it.

But I also share Entsophy’s suspicion that those with some statistical training do more harm than good. It would be nice to have some empirical (even just observational) data on that. Especially given that the advice for recruiting into the new fangled Data Science profession, seems to suggest the best candidates have a non-statistics primary Phd with a few “good” courses in statistics.

Andrew and others got it right on the other thread (“Sometimes it’s worth making the effort to think carefully about what replications you’re interested in” – Andrew).

But that still leaves us with Radford Neal’s question: “How can such a simple issue be sooooo misunderstood?”

I think a part of the blame must be the idea (deeply embedded in the way that statistics is traditionally taught) of treating the concept of “population” rather than the concept of “model” as central. If we ditched the term “population” from our vocabulary entirely and just focused on models, this issue simply could not come up. And I don’t see that we would lose anything useful?

Or explicit about what our model (representation) was inteded to represent i.e. the present, future or past population.

I think it’s clearer if one thinks about an underlying generative process, which is usually what’s of interest. Then it’s clear that even the data which includes the entire population is still only a projection of the underlying phenomenon without explicitly requiring thought experiments about future/past extrapolations (which may not be of direct interest) or counterfactual realizations.

Exactly – when I say model I mean a generative model. My suggestion is that all of statistics could be rewritten in terms of generative models – of course there are parts of the discipline that wouldn’t fit such a presentation, but I’m thinking those are exactly the parts that are dubious.

Such a suggestion is presumably controversial – would people here disagree with it, or would one have to go to more traditional frequentist communities to find people who are opposed?

I’m of a different cast of mind that O’Rourke, revo11, and konrad. I would say the key is to focus on concrete real facts about the universe we live in. We start out with facts (for example x_1, x_2) and want to know some other fact about our universe. For example:

A_1: The measured values x_1,x_2 are no the true values because of measurement error. If we could see the true values we’d see that x_1<x_2.

or

A_2: The true values of x_1, x_2 for the next month will be within delta of each other.

A_1 and A_2 are real concrete statements about the one real universe we live in. Models, probabilities, Random Variables, Random processes, and measurements that could have happened but didn’t', are all figments of our imagination. They may be useful figments, but they are useful to the extent that they allow us to connect facts like x_1,x_2 to facts like A_1,A_2.

So before an analyst focuses their attention on the phantoms and fantasies of statistics, it’s worth taking a few moments to ask: What facts do I know and what facts do I want to know?

Your position seems evily nominalistic or even almost to the point of being Normal Deviate and Larry W like ;-)

You may wish to read

Paul Forster. Peirce and the Threat of Nominalism, Cambridge University Press, 2011

Or Not.

O’Rourke,

I have no opinion about nominalism either good or evil. I do however, believe the following:

We are a small portion of the universe and have only an infinitesimal number of facts about it. The goal of Science in general and statistics in particular is to relate such facts as we do have to each other in the face of that vast ignorance.

The models we use to do this sometimes reflect portions of reality, but sometimes they are almost completely fanciful. For example, the Ideal Gas laws were derived in the 19th century using a bewildering verity of statistical/atomic models which we now know are wrong. The resulting Ideal Gas equations were still correct however and were still just as useful and usable as if they’d been derived “correct” models.

So I have no problems with using fanciful models to get answers. The problem I have is that statistical education and/or the Frequentist mindset causes people to believe the models are more real than the concrete facts they’re trying to relate. That’s how they keep making the kind of conceptual mistake I described in my first comment.

And that’s the answer to Radford Neal’s question: “How can such a simple issue be sooooo misunderstood?” Because instead of thinking:

“Random process are a fiction and the data are real”

They think

“Random processes are real and the data is just one of some amorphous cloud of possible outcomes”

@Entsophy: No, no, no! I was fishing for dissenting points of view in order to learn how people think who disagree with me. Instead, you are presenting a point of view I completely agree with! :-)

I agree that the sort of questions you present are the real questions of interest. Modeling is (to my mind) the only (rigorous) way to go about answering them. I think that presenting models as a central focus of statistics would _clarify_ the distinction between models and reality. At present, the concept of a model doesn’t get enough attention and this is why people come away without a clear notion of how (e.g.) random processes are just a made-up description of reality – it would be an easy distinction to make on day 1 of Stats101. By contrast, if instead we work with the concept of “population”, it seems to slip to and fro between referring to reality and an idealised description.

I agree with revo11 and konrad that the way to look at this (and most other things in statistics) is by considering the underlying generative model. This is not a frequentist versus Bayesian question. konrad implies one may need to go to “frequentist communities” to find people opposed to this idea. I have a traditional frequentist education in statistical theory and I was taught the model perspective. In my experience, the people who are confused about this are people who do not have training in statistical theory. They’ve had some applied statistics classes, and have been taught to view all of statistics in terms of samples from a population.

However, I get the sense that Larry Wasserman would be strongly opposed. I come from an Engineering background (just use whatever is to hand and move on to the application) so I am still trying to learn where the faultlines are in the statistical community.

This comes up a lot in adverse impact cases. We know, for example, that old people were laid off at a higher rate than young people. But what we need to know is whether or not they were laid off *because* of age. That naturally leads to a question about the generative process, which can then be answered probabilistically on the whole population. In the simplest case, it defends the use of a Fisher Exact test (which I know, Andrew, you like neither the concept of or the name of) in which the old-young categorization is fixed in the population and, arguendo, the number of people fired is fixed. So both margins are fixed and the genrative process is assumed randomness with respect to who got fired. This naturally generates a sensible standard error even with the whole population…. or so I’ve argued in court against the critique that you have the whole population and there’s no randomness.

Konrad wrote earlier:

“At present, the concept of a model doesn’t get enough attention and this is why people come away without a clear notion of how (e.g.) random processes are just a made-up description of reality – it would be an easy distinction to make on day 1 of Stats101. By contrast, if instead we work with the concept of “population”, it seems to slip to and fro between referring to reality and an idealised description.”

I agree with the spirit of this, but I really wonder how your Stats101 would go down with the students. In many places, it could be exactly the right course for a small fraction of mathematically mature students, but I fear the converse. (Looking at the existing books, I can’t see that many courses are likely to manage well the speed-up from data analysis done really slowly [a good idea, on the whole] to lots and lots of inferential procedures done piecemeal, as most chapter structures imply. So I am very puzzled from a distance on what such courses can really achieve.)

The fact of the matter is that sampling from a population is a natural hook for many learners (opinion polls etc.), just as it’s also very true that it’s only one kind of statistical problem.

Having never taught statistics I don’t know how this would go down with students in general, and certainly not how it would go down with students who have chosen to study statistics rather than, say, computer science or mathematics. I do know that when I was choosing what to study I avoided statistics because I was under the impression that it was all about things like sampling and (recipe-based) data analysis, which still seems to me like the worst conceivable way of marketing the discipline.

Had I known that it is really about questions regarding how to reason from observations (inferring conclusions from available information, coming up with descriptions of how things work that then allow explanation and prediction), and had this been the way it is taught, my choices as an undergraduate might have been very different.

The only course I am aware of like that, was Ian Hacking’s course at University of Toronto, that I sat in on in the mid-1990s (that ended up in a book by him.)

Not too long ago, measure theory was used to screen people out of statistics and even biostatistics programs ;-)

Apparently Andrew somehow did not have to go through that (by self report).

To avoid making many replies, I will collect them here.

Entsophy: My apologies if the humour in my first line was not apparent. Perhaps it depended on a past post I remember about Nominalism and Larry Wasserman that I can’t locate now.

The reference I gave does offer arguments against your view, but you are free to disregard them. (Peirce though did use the term “evil”.)

This, with a refinement of what is meant by real, does summarize my sense of things

“Random processes are real and the data is just one of some amorphous cloud of possible outcomes”

Konrad: I agree with “seems to slip to and fro between referring to reality and an idealised description” and I have commented a couple times that people often confuse the representation with what is being represented (I think this is the same as what you mean.)

As for Larry, I would not presume to guess how he would weigh in on this (we were grad students together), my veiled reference to him was explained above.

Jonathan (a different one): Neat, can you disclose or comment on how well your argument was received in court?

Nick: I did subject two classes of Stats 101 students at Duke to some model (or representation) theory. Even positioned statistics as trying to be between math which is always right and empirical science which is always wrong (they liked that) by limiting how often one was wrong but not knowing when. I don’t think mathematical maturity mattered here or not. The major problem was that I could not make it in line with course reinforcement (getting marks) scheme. Unfortunately that mostly came down to solving those to us very simple puzzle exercises to get the right p_value, confidence interval, implication, etc. and that is very hard to change. They did though seem to enjoy that part of the lectures more than many of the other more usual statistical content.