When I was a kid I took a writing class, and one of the assignments was to write a 1-to-2 page story. I can’t remember what I wrote, but I do remember the following story from one of the other kids. In its entirety:
I snuck into this pay toilet and I can’t get out!
In the discussion period, the kid explained that his original idea was a story explaining the character’s situation, how he got into this predicament and how he got stuck. But then he (the author) realized that the one sentence captured the whole story, there was really no need to elaborate.
(To understand the above story, you have to know the following historical fact: Pay toilets in the U.S., decades ago, were not the high-security objects shown (for example) in the picture above. Rather, they were implemented via coin-operated locks on individual toilet stalls. So it really would be possible to sneak into certain pay toilets, if you were willing to crawl under the door or climb over it.)
Anyway, this is all preamble to a very short statistics story.
Jessica Smith wrote in with the following question:
In multilevel modeling, is it appropriate to aggregate the outcome variable and include it as a control variable at the contextual level? For example, if you’re predicting depression as a person-level outcome, is it appropriate to control for average neighborhood-level depression? If it is or isn’t appropriate, is there something I can cite along these lines?
My reply: In this case, I think you have to be careful. It is better to avoid using a variable to predict itself.
There were all sorts of things I could’ve said, about simultaneous-equation models and measurement-error models and latent variables and time sequences. But, upon reflection, it seemed to me that the two-sentence answer said it all.