How do you interpret standard errors from a regression fit to the entire population?

I’m working on some regressions for UK cities and have a question about how to interpret regression coefficients. . . .

In a typical regression, one would be working with data from a sample and so the standard errors on the coefficients can be interpreted as reflecting the uncertainty in the choice of sample. In my case, I’m working with every city in the UK so the error interpretation isn’t as clear. There are two sources of confusion:

1. Since I’m working with a full population, can I just ignore the coefficient errors or do they have an additional interpretation that might be relevant? I’ve seen some mention of finite populations but am not sure how this might apply in a classical regression.

2. The definition of a city is itself somewhat uncertain. In my study I’m looking at about six definitions, each one consisting of a full population of UK cities (though each definition has a different number of cities; it’s not just the attributes of each city that change but the population size itself). Would it be sensible to interpret regression coefficient errors as capturing this uncertainty, or would an alternative model formulation be more appropriate?

1. Some econometrics heavy-hitters recently weighed in on this very question as well:

Finite Population Causal Standard Errors
Alberto Abadie, Susan Athey, Guido W. Imbens, Jeffrey M. Wooldridge
NBER Working Paper No. 20325
Issued in July 2014
http://www.nber.org/papers/w20325

• Andrew says:

Bill:

Thanks for the link. I’ll take a look. On quick glance the paper seems consistent with what I wrote. In particular, this bit of theirs, “these standard errors capture the fact that even if we observe outcomes for all units in the population of interest, there are for each unit missing potential outcomes for the treatment levels the unit was not exposed to,” seems similar to my idea that we can be interested in what is happening in the same places in other years. Just to be clear, I’m not saying at all that they are copying me; rather, I think this is a fundamental idea that should be expressible in different statistical settings.

2. eps says:

Hi,

Interesting Reading your answer. This paper by Frick deals spesially with similar problems in experimental psycholgy. I understand it as you would argue much the same way?

3. Anonymous says:

I blame statistical education. Everything is framed in terms of sampling from a population rather than what people intend to learn from these studies, which are underlying causal relationships. Even with regards to cities, there’s usually something along the lines of “how did this policy lead to different outcomes across cities”, in which case the population of interest isn’t _the_ population, but rather the space of “possible” cities.

4. Matthew says:

I’m sure I’m missing something here, perhaps big or perhaps subtle, but is it not enough just to say that the uncertainty is due to the fact you haven’t been able to observe the error term?

If you had a perfect model of the population you wouldn’t need an error term at all, right? you include it because you think there’s some variance which the deterministic part of the model can’t explain.

You could think about repeated draws of the response under the same population of covariates. The error term and the resulting coefficient estimates would be different each time.

In the case of comparing two population means, you can decide that you’re only interested in comparing the two population means (in which case no need for an error term or a statistical model), but more likely you think there’s a true underlying mean which you have a noisy observation of, and you want to answer questions about differences in those underlying population means, not the noisy observed population means.

• Matthew says:

… and if talk of hypothetical replications bothers you, can you not just take the Bayesian interpretation: you were uncertain about the coefficients a priori, you observed a noisy function of them (who cares whether it’s all or some of any given population), you end up with uncertain posterior beliefs. You might need to quantify this uncertainty with more than just a point estimate, depending what sort of decision you need to make and what posterior expected loss it minimises.

Perhaps things get a bit more subtle if you’re interested in doing causal inference?

5. question says:

“In a typical regression, one would be working with data from a sample and so the standard errors on the coefficients can be interpreted as reflecting the uncertainty in the choice of sample. In my case, I’m working with every city in the UK so the error interpretation isn’t as clear.”

Say you are studying a complete population of boxes. You have measured the volume and the length of each box, but not the width or the height. You would see a correlation between length and height but it would not be perfect. So isn’t another interpretation of the error simply that not all the influences have been measured? Also you always have measurement error, which is what I understand the second point to be about.

• question says:

correction: “You would see a correlation between length and _volume_ but it would not be perfect.”

• question says:

I have to wonder if I am misunderstanding what is meant by regression and standard error here. Is my comment stupid?

• Kyle C says:

The suspense is killing me.

I gather from Andrew’s earlier post that you’re right and the trick is to think about what you think your regression line actually means in the first place.

• It’s a good point. What the heck is your regression line for. Certainly it can’t be for predicting values for the cities in question. We have measured values for those. It *could* be for discovering the size of the measurement errors, if you think that the regression line is somehow closer to the truth than the measurement, then the regression residuals could give you an order of magnitude of the measurement error. It could be for trying to find causal information, such as when the regression is fit to some covariates that are thought to actually cause the different outcomes in the different cities in some way. It could be as a way to estimate functions of the data for use in other contexts. For example if you want to have a simple formula that estimates the income per capita in a certain region of space, you could specify the region of space and then find the cities in that region and add up the data, or you could have a regression function which when you integrate it over some region of space gives approximately the same value as the sum over the data….

There’s lots of things you could do with a regression, and the meaning of the uncertainty changes when you change the model and the purpose of the model.