Skip to content

Causal identification + observational study + multilevel model

Sam Portnow writes:

I am attempting to model the impact of tax benefits on children’s school readiness skills. Obviously, benefits themselves are biased, so I am trying to use the doubling of the maximum allowable additional child tax credit in 2003 to get an unbiased estimate of benefits. I was initially planning to attack this problem from an instrumental variables framework, but the measures of school readiness skills change during the course of the study, and I can’t equate them. My (temporary) solution is to use a multilevel model to extract the random effect of the increase of the benefit on actual benefits, and then plug that random effect into my equation looking at school readiness skills. The downside of this approach is that I can’t seem to find much research that suggests this is a plausible solution. Do you have an initial thoughts about this approach, or perhaps papers that you’ve seen that use a similar approach. I don’t know any one with expertise in this area, and want to make sure I’m not going down a rabbit hole.

My reply:

To start I recommend this post from ten years ago on how to think about instrumental variables. In your case, the ideas is to estimate the effect of the doubling of the tax credit directly, and worry later about interpreting this as an effect of tax benefits more generally.

Now that you’re off the hook regarding instrumental variables, you can just think of this as a regular old observational study. You have your treatment group and your control group . . . ummmm, I don’t know anything about the child tax credit, maybe this was a policy change done just once, so all you have is a before-after comparison? In that case you gotta make a lot of assumptions. Fitting a multilevel model might be fine, this sort of thing makes sense if you have individual-level outcomes and individual-level predictors with a group-level treatment effect.

So really I think you can divide your problem into three parts:

1. Causal identification. If you were going to think of the doubling of the tax credit as your instrument, then just try to directly estimate the effect of that treatment. Or if you’re gonna be more observational about it and make use of existing variation, fine, do that. Just be clear on what you’re doing, and I don’t see that much will be gained by bringing in the machinery of instrumental variables.

2. The observational study. This is the usual story: treatment group, control group, use regression, or matching followed by regression, to control for pre-treatment predictors.

3. Multilevel modeling. This will come in naturally if you have group-level variation or group-level predictors.


  1. Z says:

    To answer what I think was part of Sam’s question, random effects models are not appropriate to adjust for confounding because they assume that the random effects are independent of covariates (which obviously is not true if covariates are confounders).

    Also, Andrew, I continue to be baffled by your wholesale dismissal of instrumental variables. You say to “worry later about interpreting this as an effect of [the treatment of interest] more generally”. Well, as soon as he follows your advice to estimate the average causal effect of the instrument on the outcome, it becomes “later” and instrumental variable methods tell him the proper way to estimate the (or “an”, there are multiple possible causal estimands depending on assumptions: effect of the treatment of interest. I have no idea how appropriate an instrumental variable analysis is for Sam’s particular case, but I think your repeated general advice to pretty much always avoid instrumental variables regardless of whether the necessary assumptions hold is harmful.

    • Andrew says:


      I do not give the advice “to pretty much always avoid instrumental variables.” Instrumental variables are fine, indeed I’ve argued that the potential-outcome approach to causal inference is all about instrumental variables, in that the idea of an instrument is the same thing as the idea of a treatment or a do-operator or whatever. I think instrumental variables are a great way of thinking about causality and estimating causal effects, and my recommendation is to start by thinking carefully about all the effects of the instrument. See, for example, section 4 of my review of Mostly Harmless Econometrics.

      On your other point, I don’t like the term “random effects model” because it’s used in different ways in different contexts. As I wrote above, multilevel modeling comes in naturally if you have group-level variation or group-level predictors; indeed, in such settings it can require a bit of contortion to avoid multilevel modeling.

      Finally, to return to the main point of my post above, I think that whenever we have an observational study, it’s good to think about general ideas of comparing treatment and control groups. Sometimes people seem to think of natural experiments, instrumental variables, discontinuities, etc., as ways of avoiding the need to think hard about causal inference in an observational study; I prefer to think of such pieces of additional information as useful in building robust models for causal inference in such settings.

      • Z says:

        >”Instrumental variables are fine, indeed I’ve argued that the potential-outcome approach to causal inference is all about instrumental variables, in that the idea of an instrument is the same thing as the idea of a treatment or a do-operator or whatever.”

        I don’t understand the above statement. In context, it reads to me like: “How could I be opposed to Taiwanese independence? I love Taiwan, because I love all of China and Taiwan is part of China.” (Fraught analogy, I know.) You’re advocating that when a problem has IV structure that structure should be ignored and what would have been used as the instrument should just be used as the treatment. That is not being fine with instrumental variables! (Unless I’m misunderstanding your advice.)

        I completely agree about the term ‘random effects model’ being confusing because it’s used different ways in different places. I was just responding to what I was guessing Sam might have meant in his question, but I’m not sure exactly what he meant.

        >”Sometimes people seem to think of natural experiments, instrumental variables, discontinuities, etc., as ways of avoiding the need to think hard about causal inference in an observational study”

        This might be true sometimes, but people also sometimes use these methods precisely because they have thought hard about causal inference, realized that there was unobserved confounding between their treatment and outcome of interest that no amount of thinking could make go away, and also realized that a variable meeting the requirements for an instrument exists and can be used to estimate something close to the causal quantity they care about.

        • Andrew says:


          No, I am not advocating that when a problem has IV structure that structure should be ignored.

          Regarding your last paragraph: I agree that natural experiments, instrumental variables, discontinuities, etc. can be useful in causal inference. Having such structure does not typically remove the need to think about confounding in an observational study, but it can make the resulting inferences more robust to certain assumptions. That’s great: it doesn’t solve all problems but it can improve our inferences. Problems arise when people think that these identification strategies remove the need to think about observational data.

          • jrc says:

            It took me a while to get this point from you. Both the “yeah natural experiments are great, but..” thing and your “IV as effect of change in T, that occurs via Z, on Y”. You pushing on both of those points has been helpful.

            There is a certain thread of thinking (or at least talking) among applied micro economists that says “exogenous variation is the beginning and end of causal inference.” That is how people can say things like “measurement error will attenuate my point estimate, so I must REALLY have found something if I have ***’s in my tables” which is the applied micro version of “my sample size was small, so you should believe my effects MORE.*”

            I think applied micro made a great leap forward with its emphasis on relating variation in the world to variation in the data to identification of a regression coefficient. But recently I worry we have grown too comfortable there. Like, all social scientists all just sort wandering through the Garden of Forking Paths, but the Economists are like “see, our paths are concrete.” I mean sure, I’d rather walk on concrete than on mud, but I’d also rather not be lost in a Metaphorical Garden all day. Sounds really metaphorically boring and wasteful. Like, that place looks expensive and I feel like I’d be wasting money just standing there**.

            *Except I guess when Heckman says it. Then the small-N thing itself is the applied micro version of the small-N thing.


          • /voice of John Cleese

            Announcer: Statisticians the world over are tired of arguing over the meaning of the chain rule of calculus. Is dy/dz = dy/dT dT/dz or not? We take you live to Brandley Park where Archibald P Thornwall and Reginald R Duckwaddler are prepared to fight to the death in order to settle this matter once and for all. Archibald. What is your position in this matter?

            Archibald: Well, Brian, as you know here in the Economics department we believe that z is an instrument variable. And we’re tired of defending this usage against Statisticians who like to use z for all sorts of things, like the orthogonal coordinate to the plane, or the population of Zebras, or even, can you imagine, the ratio of a difference from the mean to the standard deviation! We simply can’t have it. And then they’re always going on about how T is a function of z and z only acts through T, requiring us to take an additional derivative in our models. It’s not to be tolerated, and I’m here to stop it.

            Announcer: Thank you Archibald, how about you Reginald? What is your position in all of this?

            Reginald: Well, to tell you the truth, I think we started using z scores before the Economics profession began even thinking about the idea of instrument variables. And in any case, it’s in the tables at the back of all the books, and I’ll be damned if we’re going to rewrite those books just so undergraduate Economics majors can avoid confusion. I mean, as an undergrad, why bother even taking Statistics if you don’t want to be confused? No, we really can’t have that. And, as for T well obviously y = T(z) except of course for Student who makes p = t(z) but we’ve always allowed him a certain degree of freedom. In any case, I’m just going to go out there and pummel Archibald until he gives up and takes the derivative of T with respect to z because it’s about time Statistics got some respect.

            Announcer: Well, there you have it, back to you Nigel.

          • Z says:

            “Thinking about observational data” is the only way to decide that an instrument and instrumental variable analysis based on that instrument are valid…

            I think I’m just going to continue to be baffled by your stance on this and continue to base my understanding of IVs on the thorough and rigorous review from the link I included in my initial comment:

            • Andrew says:


              That reference you cite is just fine, and I don’t think it contradicts anything that I’ve written on the topic.

              • To me the thing about all this IV stuff is that it’s written in the language of “magic” rather than of understanding and mathematical modeling. Even the cited reference is subtitled “An Epidemiologists Dream” and in the intro:

                “Regardless of how immaculate the study design and how perfect the measurements, the unverifiable assumption of no unmeasured confounding of the exposure effect is necessary for causal inference from observational data, whether confounding adjustment is based on matching, stratification, regression, inverse probability weighting, or g-estimation. Now, imagine for a moment the existence of an alternative method that allows one to make causal inferences from observational studies even if the confounders remain unmeasured.”

                But IVs aren’t magic. They’re just what you get when some causal process happens, and *that causal process isn’t confounded*. an IV is just “someone did something, and that thing isn’t thought to be confounded with an unknown”

                If you pass a law on sentencing, and a law increasing the police funding, and they take effect at the same time… you could measure the causal net effect of both, but you can’t tease out the effect of each one. If you pass this set of laws, and at the same time Louie the mob kingpin decides independently to pull his operations out of Dodge City, then your estimate is confounded with the unknown “Louie” effect. The coincidence that they both happened to take effect Jan 1 isn’t magically fixed by the IV approach, it’s just that the plausibility that there are many things that happen to be changing at the same time as some policy or whatnot is small.

                Any un-confounded cause is… an un-confounded cause. Giving it a special name and putting it on a pedestal… not helpful in increasing modeling thinking and decreasing magical thinking.

              • jrc says:


                Don’t worry, “machine learning” is the new magic, and in ten years IV will be that old thing that people don’t really believe anymore (like matching and selection models before it).

                In 20 years the credibility and replicability crisis in science will actually be solved when the machines destroy us all.

              • Z says:

                Daniel: To me the thing about all this IV stuff is that it’s written in the language of “magic” rather than of understanding and mathematical modeling. Even the cited reference is subtitled “An Epidemiologists Dream”

                It’s subtitled “An Epidemiologist’s Dream?” with a *question mark*. Maybe you’re familiar with the pattern that whenever a question is asked in a headline the answer in the body of the article is ‘No’? Well, the answer they give in this case is No, too. These aren’t snake oil salesmen for instrumental variables. Your comment about this being written in the language of magic is really comical given that this article is co-authored by Jamie Robins, who established the mathematical formalism of causal inference for time varying treatments and is an extremely rigorous thinker. Maybe you should read past the first paragraph of the introduction, which is a hook. The substance of what they’re saying here, once you’ve accepted the counterfactual framework, is just math and isn’t really up for debate. If you’re still “not on board” after reading the rest of the paper and the appendix you can go to its references. (Given the lengthy debates I’ve seen you get into in the comments on similar topics, I know you think you already have a deep understanding of causal inference and that the topic necessarily reduces to fully modeling entire processes. It doesn’t, try to have an open mind.)

                Andrew: “I don’t think it contradicts anything that I’ve written on the topic.”

                I feel like I’m getting gaslighted here.

                Your recommendation: Don’t use the instrument to estimate the effect of the treatment of interest, just estimate the effect of the instrument.

                Their recommendation: If the necessary assumptions hold, use the instrument to estimate the effect of the treatment of interest.

                Maybe there’s some semantic sense in which you haven’t contradicted them, but the basic thrust of your advice seems to me to clearly contradict the basic thrust of their advice, which is also pretty much the consensus advice of the causal inference community.

              • I’m more speaking to my impression of the IV usage than this particular article. For example the Chinese coal policy paper from a few years ago. It uses a policy change in the 50’s to magically give causal estimates in the 2000’s. If people give a meaningful causal model and estimates that’s fine. But IV is not always used that way, often the exogenous cause is seen as a way to automatically get correct inference, as if it’s not possible to have confounding. These things just reduce plausibility of confounding. In the end, if you don’t have lab like control to observe consistency, you still have an observational study where one observation seems more like a controlled treatment than other observations

              • Also the stuff about calculus above. Andrew recommends estimating Dy/dT dT/dz and you recommend estimating Dy/dz… They are the same quantity. The difference is in the method of estimating. Andrews recommendation allows more sophisticated models because it includes information about an intermediate quantity that may be affected by several variables other than z. That’s how I read it at least.

                Finally even if the authors of your recommended paper aren’t magical thinkers, the existence of their hook suggests that the audience is, and that the authors know it.

              • Z says:

                “They are the same quantity.”
                Nope, they are not the same quantity. Andrew recommends estimating the effect of the instrument on the outcome (and I guess separately on the treatment of interest as well). I recommend using the instrument to estimate the effect of the treatment of interest on the outcome. Those are different quantities, and this is a substantive disagreement.

                “Finally even if the authors of your recommended paper aren’t magical thinkers, the existence of their hook suggests that the audience is, and that the authors know it.”
                Yes, I agree. But this debate has not been about describing the population of users of IV methods. It’s been about IV methods themselves and when they should be recommended.

              • jrc says:


                But at the same time that you directly instrument, we also tend to show the reduced form (dY/dZ) and the first stage (dT/dZ) before showing the IV (dY/dT). Right? Just making the point that the whole chain is important (I mean, the IV estimate is just a re-scaling of the reduced-form estimate anyway).

                But in general, I think you are missing a crucial point about interpretation. Interpreting the result of the final IV regression as a “pure causal effect” is usually unwarranted, and you’d do better to think about it as the average effect of T (your covariate of interest) being changed by Z (instrument) on Y (the final outcome of interest).

                That may or may not generate a “better” estimate of some average population causal effect than some other model. The difference is in the understanding that not everything that moves T will lead T to have the same effect on Y, and moving T for different people will produce different changes in Y. You could think of that as LATE (a common economist framing) or you could think of it as an under-specified structural model that says all changes in T have the same effect on Y, but doing that is just adding assumptions to make the analysis sound more like it is universally applicable (pitching your “external validity”). And I think that part of Andrew’s concern with IV is the idea that researchers (and consumers of research) think once they have “a” causal effect, they have “the” causal effect. By keeping the chain of reasoning in mind, it leaves it open as to how changes in T from other sources of (exogenous) variation would affect Y (or changes in T for other people in the population).

                I know economists don’t usually think that way, but I think it is a more honest way of thinking about IV, and often more useful to me. I mean, we can talk all we want about the math, but in the real world instruments come in more and less plausible forms, and move T in different ways, and T itself has varying effects that are poorly modeled and averaged over. IV doesn’t do anything to save you from that. And I think Andrew thinks (and I tend to agree) that once people internalize the math, they use it as an intellectual and rhetorical cheat to claim they’ve found “the causal effect” and they can pretend the world isn’t complicated and messy. Sure, this critique applies to almost any statistical analysis, but I think that is kind of the point. Andrew’s interpretation of IV is just his general critique about regression models applied to the specifics of the IV setting – everything has varying effects, and any method that pretends to completely describe those effects in a number is a sham… we should strive to do better, and part of doing better is thinking clearly through the limitations and alternative interpretations of our models.

              • Ram says:

                I don’t think anyone disputes the mathematical analysis of IV, whether of the potential outcomes/counterfactuals or DAG/structural model variety. The question is what role IV should play in applied causal inference.

                Using an IV often ties our hands in terms of what estimand is of interest. We often want to estimate quantities besides the particular quantity a given IV identifies; thus insisting on an IV approach restricts the kinds of questions we can ask.

                Under the assumptions of IV analysis, use of the IV solves the confounding problem, but we have to ask whether these assumptions are met in any given problem, and this is not answerable using the data available in general. This means that the claim “Z is an IV” is not fundamentally different in kind from the claim “all and only XYZ are confounders”. Neither is demonstrable using the data, and the plausibility of each is judged by the audience based on their background knowledge.

                In some sense IV is cleaner than controlling for confounding (regression/matching/etc.) since if you’ve got one IV then you’re done, whereas even if you list out XYZ there’s always the question of whether you’ve got everything on the list. Still, wondering whether you’ve hit all the confounders is not really dissimilar from wondering whether your IV really meets the exclusion restriction. If it doesn’t, you’ll have residual confounding, and will need to adjust for this.

                Then there’s the question of appropriate interpretation. In the case of a weak instrument, the 2SLS estimator becomes really noisy, and this raises the question of what use the point estimate is, or even an interval estimate that spans implausible values.

                Long story short, IV is most certainly a helpful tool, but unless you’ve got a strong instrument and everyone in your audience agrees that it definitely meets the exclusion restriction, and it happens to identify the parameter(s) of actual interest (varying treatment effects?), as opposed to some other parameter which is only somewhat related, then to really answer the question of interest you’re going to need to do some careful modeling, and this will probably involve some control of confounders among other things. So while an IV or IVs can be helpful, finding one is usually only the beginning of the modeling exercise.

              • Andrew says:


                1. This post from a few years ago might clarify the ways in which I think that the concept of an “instrumental variable” is actually central to thinking about causal inference for observational studies. In many or most situations, I can’t even think about causality without putting the problem into an instrumental variables framework.

                2. I also refer you to this comment from earlier this year. Natural experiments are great, and I think that when we have such scenarios it makes sense to look at all the effects that flow from such a “treatment.” But let’s remember that, at least in the problems in social and environmental sciences where I do most of my work, these are not really experiments; they’re observational studies, and an instrumental-variable structure does not let us off the hook from all the usual concerns of an observational study.

              • Let’s take a structural approach to some problem, and assume algebraic structure (rather than say an ODE or something more complicated). Also, we’ll take the case where changes are small enough to approximate with linearity because that’s pretty common, but we could in fact do Taylor series with higher order terms or something.

                We’re interested in how y works, we have a structural equation with unknown functions f,T

                y = f(a,b,T) + erry
                T = T(a,q,z) + errT

                now, substitute in T to the equation for y

                y = f(a,b,T(a,q,z)+errT) + erry

                Now, assume some small changes in z and corresponding small changes in T because it’s illuminating and done often enough:

                dy = df/da da + df/db db + df/dT dT =

                df/da da + df/db db + df/dT (dT/da da + dT/dz dz + dT/dq dq)

                Now suppose z is our instrument, where a change in z is imposed in a way that makes it implausible that the change also changed a, the confounding variables (a could be a vector for example, and a could be things we don’t even know about, we just know they exist)

                now da = 0 and we’re left with:

                dy = df/db db + df/dT * (dT/dz dz + dT/dq dq)

                now, if we’ve measured b,q, we can solve for the thing of interest:

                dy – df/db db = df/dT * (dT/dz dz + dT/dq dq)

                (dy-df/db db) / (dT/dz dz + dT/dq dq) = df/dT

                and this is the “magic” of instrument variables as far as I can see. They allow you to not have to learn multivariable calculus so long as you’re able to follow push-button instructions in the SAS or STATA or SPSS manual or whatever.

                the “da = 0” condition is known as the “exclusion principle” and dT/dz is known as the “first stage”

                The whole thing is an exercise in someone figuring out some mathematical understanding, and some other people who don’t understand that mathematical basis creating a magical cargo cult to surround it.

                Please tell me I’m wrong. I personally just stick to creating mathematical models based on people telling me how they thing the physics, chemistry, biology, economic decision making, etc work. But if you can tell me that there’s something deep that I’m missing here… I’d be interested to know it.

              • ojm says:

                +1 to your linked post on instrumental vars and comments like

                > I find a claim such as “skin color does not affect intellectual capacity” to be undefined (until I know what instrument is being considered to affect skin color)

                Similarly I find examples like Pearl’s ‘mud does not cause rain’, ‘symptoms do not cause disease’ etc ill-defined.

              • ojm says:

                Though I’m not really sure I like the ‘variable’ part of ‘instrumental variable’. It seems to me that the reason they make more intuitive sense is that they describe _processes_ rather than variables

              • im ok! says:

                Whew, just came to; I was knocked over by that tsunami of arrogance from Daniel’s above post. Just letting everyone know I’m OK, and fully recovered.

              • ojm says:

                Though perhaps there is a further subtlety here.

                You I think are talking about wanting to know the process for bringing about the change in skin color/mud level/whatever and I seem to be talking about wanting to know about the process for bringing about a subsequent ‘effect’ given this ‘change in level’.

                In both cases, however, I find the concept of a causal _process_ more intuitive than taking about one ‘variable’ causing another ‘variable’. Kinetics vs kinematics. I wonder if any of Wesley Salmon’s work made it into the stats world?

            • ojm says:

              For example ‘there is no process converting mud to rain’, which of course is not strictly true it’s just overwhelmingly unlikely (eg any such chemical process taking rain/water/dirt to mud would require a huge decrease in entropy to spotaneously reverse)

              • “mud doesn’t cause rain” isn’t even really true. I mean, suppose you’re somewhere with an enormous quantity of mud, through time it dries up. The moisture goes into the air, somewhere else it precipitates out as rain. The water cycle!

                what “mud doesn’t cause rain” really means is “mud is what you get after you add water to dirt, so if you see mud it must be that at some point in the past, someone or some thing or some process added water to some dirt” which isn’t nearly as succinct, but is much less wrong.

              • ojm says:

                > The water cycle!

                Yeah exactly.

                I mean there are various formal frameworks for causal reasoning that I’m sure can address this but most of them still seem pretty clumsy to me.

Leave a Reply