Agree with what you say, but it doesn’t speak to the problem Shane has, which is a cross-sectional observational data set.

RCTs work to isolate effects, because if randomization is effective, all the other covariates are balanced. Quasi-experimental analysis, often done in a pre-post difference in difference framework, assumes the effects of the covariates over time are constant and the other differences in environment other than the policy change studied are similar across treatment and control groups, so they can be differenced out.

Neither of these speak to Shane’s question, which is that his parameter estimates are contingent on which other variables are included in the model. Andrew said “It’s a different model. So of course.” I’m recommending to Shane that he try to understand why the estimates are changing and offered several things for him to think about: interactions between his variables, collinearity due to correlation of the observed variables in the population, collinearity due to overlap in what the measures as constructed were measuring. Each has different implications for interpreting the model and choice of whether and how the multiple measures are included in the final model selected.

]]>One particularly insidious issue is that an interaction of two correlated predictors can act as a near proxy for a simple square of one or the other variable.

But even if you hit the right construct as an interaction, your reader has no idea how to interpret it. How do you translate a multiplicative term into a concept the human mind can wrap itself around? Some say it’s a sort of “synergy” which sort of rings a bell. In my experience you have to aim for maximum clarity. There is a wonderful book by Aiken and West called Multiple Regression: Testing and Interpreting Interactions. To interpret an interaction, you often end up spelling it out in more detail, and often by rendering the model as linear components with different slopes.

]]>As Martha said, good for you for pushing deeper in your understanding.

]]>I will send you via e-mail a manuscript that Nick Brown and I wrote that contains a clear (we think) description of the three kinds of suppression.

Carol

]]>The link is much like the one I gave below.

The reality the Simpson’s Paradox example depicts is that of two intercepts and one slope _not_ two intercepts and two slopes.

Jeff goes into something much more specific (along with a term I have never heard of) but which may line up well with Shane’s current interests. More generally, its all about realizing the differences between models and hoping to get the least wrong one.

]]>That would make a good op-ed: the difference between “impostor syndrome” and healthy recognition of one’s need for knowledge and understanding.

Shane, thank you for raising the question about suppressor variables! This discussion is helpful and interesting.

]]>Just for the record, I agree that the important things to consider when a practitioner is choosing a regression model are not issues about matrix algebra but issues about how the model relates to the world. I just think that understanding the matrix algebra is one important way of de-mystifying statistics… it is a reminder that there is no magic going on when you “control” for something.

As for how I think about models when I am doing regression analysis – I think about comparisons in the world. Who is this model comparing to whom in order to make claims about the effect of some thing in the world? Does it make sense to compare within- or across-units? How do I specify a model that implicitly leverages the comparisons that I think are more valid or illuminating or useful instead of comparisons that are less useful.

RCTs fit this easily: I want to compare the treated and the control units. But fixed-effects models for quasi-experimental or observational studies work similarly. If I want to know the effect of, say, a policy mandating maternal work leave that affected some states and not others, I might want to look at changes in my outcomes in the states that implemented the policy relative to changes in states that did not implement it. IV is the same – leveraging differences across people generated by the instrument forces a comparison among otherwise similar people who were affected by the instrument and those who weren’t.

So yes, I agree with you that it isn’t about matrix algebra. But I don’t think it is about thinking carefully about all the cross-correlations either. It is about thinking about the world and how to use what we know about the world and about regression modeling to identify the effect of interest by making useful (as opposed to misleading) comparisons among people. The matrix bit is just one way to remind ourselves that there is no magic in statistics, only comparisons of differences across observations.

]]>It’s not enough to say this sort of thing can happen, because the model is different. One needs to think about what the model is and how well the measures capture the concepts in the model.

In the standard regression model, Y=XB, the underlying conceptual model of how the world works and how each of the RHS regressors relate to the LHS dependent variable and to one another is that each X is correlated with dependent variable (positively or negatively) and not correlated with the other RHS variables. But that is rarely a description of how the various aspects of the world we are monitoring relate to one another. RHS variables may interact; they may be correlated, so we observe changes in X1 and X2 concurrently.

Researchers using multivariate models need to think about their conceptual model of how the RHS variables interact and how they are associated with one another and their dependent variables. They need to think about whether their measures are correlated, even if there is no interaction or mediation, and even if the concepts they are trying to measure are not correlated.

This is not a problem of matrix algebra, as jrc frames it. Jeff McLeod gets at it the issue when he talks about a having a theoretical reason why the pattern makes sense. I would be looser than that, applying a lower level of conceptualizing that full blown theory. And I would think about the meanings of the measures and how closely the measure as constructed relates to the underlying concept being measured as well as to the other measures in the model.

So, my response to Shane would not be to note he has different models and this happens but to ask him what his underlying conceptual model is; does it predict or anticipate interactions among the RHS variables. Does he expect his RHS concepts to be correlated causally, or mediated, or spuriously correlated because they are causally associated with other variables not in the model. Are his measures correlated, even if the underlying constructs are not, and why. What is his causal model? And what is his regression analysis confirming or raising questions about regarding his causal model, and how should his model be updated to reflect what he has learned in the analysis.

]]>Both of my masters are in psych, and I did well in all of my stats classes, but I’ve always felt that I only learned enough stats to just “get by.” Maybe it’s imposter syndrome, maybe not. But, when I eventually get my PhD, I want to ensure that I master as much of this type of material as I can so that I can minimize (as much as is possible) the mistakes that I’ve learned about through reading this blog (and Uri Simonsohn’s and Daniel Lakens’).

Thanks again for the comments!

]]>Thank you for the response! I’ll definitely check those out and look for your article. Coincidentally, I recognized Dr. Crede’s name b/c I just applied to Iowa St.’s PhD program.

Whatever school I get into, I’m confident I’ll learn much more about these procedures and feel more comfortable with my analyses and interpretations.

Thanks again!

]]>I’m still amazed when I realize that people think of “I included a proxy measure of that characteristic in my matrix of right hand side variables as a linear relationship between it and the (probably poorly measured proxy) outcome variable” as “I’ve controlled for that.” Then they internalize “controlled for something” as “it isn’t a statistical problem anymore”. It’s like, because I had a list of 7 of your household assets, I can generate an index and then I’ve “controlled” for wealth. So now I know how much wealth matters, and I know that none of my other parameters are tainted by omitted wealth variables. Sure.

So many problems and questions just start sounding ridiculous after you’ve internalized the math. Once you realize that you are just dealing with a mathematical projection, once you see your X matrix as a series of numbers that vaguely represent things in the world… then all that talk about moderating and mediating and suppressor variables (you know, where people over-literally interpret all of these parameters in relation to real things in the world and speculate wildly using their favorite theory jargon), all that just disappears and you wonder who taught these people to think about a) statistics; and b) the real world. Obviously someone who did them real harm in their quest to understand the way the world actually works.

]]>X1 | X2

X1 | X2,X3

X1 | X2 et al. (for 3 or more) ]]>

I might have lost too much faith, actually.

]]> B se B

X1 | X2,X3 … 4.2 .8

etc., rather than just X1, but the width of paper would become a problem (particularly with the length of variable names people use nowadays). I think just printing X1 makes people lazy when they say things like the B is the “effect of X1” rather than the “effect of X1 conditional on …”.

]]>After finding a suppressor effect, I would follow it up by breaking the suppressor variable into groups (high vs low) and running regression within each group. If the regression coefficients flip back, you can at least suggest that the interaction is a plausible cause.

So no, you’re not naïve Henning. I’d go the same route. But I look at the suppressor variable as a possible diagnostic that could indicate an unspecified interaction, or a multi-level situation as another commenter indicated.

]]>Yes, this is suppression. I suggest that you take a look at Tzelgov & Henik, PSYCH BULL, 1991 or Lewis & Escobar, THE STATISTICIAN, 1986, for easy-to-understand explanations.

For an empirical example, Marcus Crede, Andrew, and I recently pointed out that the results in that famous (or infamous) PNAS air rage study by DeCelles and Norton were due to a particular kind of suppression (negative suppression, which results in a sign reversal).

The original article, our comment, and an off-the-mark reply from DeCelles and Norton have now all been published in PNAS.

Carol

]]>More generally, it is the model one has to be specified well enough to adequately connect to reality in some sense.

Model specification includes all the parameters, some of which are taken as common for all observations, different for subsets of observations (e.g. interactions) or partially pooled (e.g. multilevel models). Getting these (too) wrong in various ways leads to various problems.

(A simple example where using one intercept rather than two leads to Simpson’s paradox – http://andrewgelman.com/2016/09/08/its-not-about-normality-its-all-about-reality/#comment-303932 )

]]>Besides, when I read the original post, I was reminded of this post: http://marcfbellemare.com/wordpress/12082

]]>Here’s an example from the educational testing world.

The criterion variable Y is college GPA.

Predictor X1 is an aptitude test like the SAT.

Predictor X2 is a measure of reading speed.

Assume all measures are transformed into deviations or z-scores.

Both X1 and X2 predict Y in a positive fashion.

But the regression results show X1 is positive and X2 has flipped negative.

A plausible interpretation is that while X2 is a predictor of college GPA, the composite (b1X1 – b2X2) is a BETTER predictor than either X1 or X2 alone. Stay with me here…

The negative regression coefficient b2 for X2 reflects that the construct of reading speed is more correlated with the residual of X1 than with the criterion itself. More clearly: some of the performance on a timed test like the SAT is valid variance related to college GPA, but some of the variance is an unfair advantage of reading speed that spuriously increases your SAT score. But this artificial advantage on the SAT is not necessarily correlated with the criterion (obviously, it will help in some classes, but not in others).

Again, this is an example. When reading speed is a suppressor, it means that the coefficients — if I may anthropomorphize them — want to clean the spurious construct variance out of the SAT. Consider the regression equations as penalizing fast readers to put them on par with ordinary readers to yield a better prediction of Y.

Good for you Shane Littrell for your interest in going deep.

]]>