Econometrics: Instrument locally, extrapolate globally

Rajeev Dehejia sends along two papers, one with James Bisbee, Cristian Pop-Eleches, and Cyrus Samii on extrapolating estimated local average treatment effects to new settings, and one with Cristian Pop-Eleches and Cyrus Samii on external validity in natural experiments. This is important stuff, and they work it out in real examples.

17 thoughts on “Econometrics: Instrument locally, extrapolate globally

  1. These papers deserve a better comment thread than this. I really like this kind of research – I call it “experimental econometrics” but “evaluation of methods” could work – and Dehejia has made contributions* to that since at least his response to Lalande**.

    Then again, I told my student I would read it more carefully before we met on Monday, and that didn’t happen either. So I guess we are all too busy to learn about whether or not the things we do every day actually work.

    *http://www.uh.edu/~adkugler/Dehejia&Wahba.pdf
    **http://www.jstor.org/stable/1806062

    • I’m calling you out, since this isn’t my area of research, wading into the papers without specific questions in mind is hard. So, what do you think are the most important questions, issues, etc, in this kind of research and how do these papers address them?

      Also, here’s a very basic question: At the heart of “external validity” is basically “causality”. The external validity of Newton’s equations is extremely high (near lightspeed notwithstanding) and that’s because we have real causal fundamental models of the forces.

      We don’t have real causal fundamental models of things like “elasticity of supply of the labor market” or whatever. So, pretty much all models of that sort of thing seem like they have to be at best valid for some sets of circumstances. There are a huge number of important variables involved about which we know nothing. Can we ever expect any kind of “strong” external validity of economic models?

      (strong validity would be something like: it was equally predictive in 1650, 1830, 1890, 1920, 1930, 1960, 1990, 2000 and in pretty much every well-developed country that existed in those time-periods).

      • “I’m calling you out, since this isn’t my area of research, wading into the papers without specific questions in mind is hard. So, what do you think are the most important questions, issues, etc, in this kind of research and how do these papers address them?”

        I’m calling you out for not having been to the spot south of Pico with the Jamaican meat patties, and for being bad at Go. With that out of the way –

        The Very Big Question is: how likely is a treatment effect estimate to be a good prediction of the effects of a similar treatment in another context? The Big Question is: supposing we have an unbiased, precise treatment effect estimate in some location, how likely is that to be a good prediction in other contexts? The Question is: how we can we use multiple, similar experiments across partially-observably different populations to gain insight about the Big Question and maybe speak to the Very Big Question?

        To answer the Question, they use multiple, similar “quasi-experiments” and relate their estimate to observable characteristics of the populations. In this case, that means an instrumental variables approach that is good if imperfect but can be applied to data from a lot of different countries. Previous work in this area has compared non-experimental estimates (usually some matching/weighting thing) to experimental estimates, with the idea of seeing which non-experimental estimate comes closest to the experimental one (which one “constructs the best counterfactual”, or something like that). This one pretends it has lots of experiments (but has lots of pseudo-experiments) and compares them to each other. I’d also put the literature on standard errors and rejection rates that runs placebo tests on real data in this literature.

        “Also, here’s a very basic question: At the heart of “external validity” is basically “causality”. The external validity of Newton’s equations is extremely high (near lightspeed notwithstanding) and that’s because we have real causal fundamental models of the forces.”

        Just to say – all of these kinds of projects need some external “benchmark” – either a known placebo effect put into the data by the researcher or an estimate considered “right”. In these two papers, they say basically “this is the best one we got that we can apply in a whole bunch of countries”. Not perfect, but discussed.

        “We don’t have real causal fundamental models of things like “elasticity of supply of the labor market” or whatever. So, pretty much all models of that sort of thing seem like they have to be at best valid for some sets of circumstances. There are a huge number of important variables involved about which we know nothing.”

        Agreed, and they agree in so much as they think that their effect of interest varies by country-level observables. That said, they are looking at one of those outcomes where we think we know what observables might influence the effect size/sign. They are estimating (over and over again) the effects of child-bearing on female labor supply. They do it cleverly (someone else’s cleverness from a while back): if you have two boys or two girls first, you have a differential probability of having another child (so the effect of a family’s having/not a third child). So they instrument for having a baby with the genders of your first kids, and estimate the effect of having that baby on whether a woman goes back to work.

        We think this effect (supposing now it is a good estimate of “the true” local effect) should vary by things like potential wages for women and labor market protections – things that are reasonably considered observable. So part of the Question is how “far” you can go from the original experimental population to another population and still believe you know something about the likely treatment effectiveness. “Far” being measured along whatever covariate set you want in whatever way.

        ” Can we ever expect any kind of “strong” external validity of economic models? (strong validity would be something like: it was equally predictive in 1650, 1830, 1890, 1920, 1930, 1960, 1990, 2000 and in pretty much every well-developed country that existed in those time-periods).”

        Well: Price changes and quantity demanded in a semi-free market seems like one that holds pretty well. Wages increase on average with education. If you hire more labor you produce more.

        But my for real Well: I don’t think that is the standard we want Economics to strive for. What we want is for it to provide us with insights that improve the lives of people – technocratic improvements in optimal allocation of resources; predictions about short-/medium-term effects of policy changes; maintenance of the currency and the velocity of money.

        The idea that we could (or would want to) mathematically model how “humans” are and always have been just seems silly (something for Anthropologists or Philosophers or G-d). We don’t want to use the same model we use to predict inflation to also help us understand workplace discrimination or capital accumulation in England during the industrial revolution. We want multiple models that help us understand those things (likely multiple models of each). The common thread in our models tends to be a prominent place for tensions in people’s decision making that arise because they can’t do all of several things that they would like to do (buy more of product A AND product B; invest in labor and capital; enjoy leisure and earn a wage at the same time;…). But that doesn’t have to be modeled so as to be the same for every context everywhere everywhen. How would such a model be useful to anyone?

        • I agree with a lot of what you’ve said, so I’m just going to touch on a few issues for further thought:

          1) “Price changes and quantity demanded in a semi-free market seems like one that holds pretty well. Wages increase on average with education. If you hire more labor you produce more.”

          To me these are too qualitative to count in this question: I could as well say “things that fall from higher heights are going faster when then hit the ground” or “stuff that’s heavy is harder to move” and that’s a lot different from Newtonian mechanics (note: I am not arguing that Economics should be like Newtonian mechanics, just the opposite really)

          “The idea that we could (or would want to) mathematically model how “humans” are and always have been just seems silly:”

          But there are lots of quantitative questions that we’d like to answer pre-data: what will banning pseudoephedrine sales OTC do to the street supply, demand, and price of Methamphetamine? If we could have predicted that quantitatively and in an accurate way we could have avoided years of misery as people buy the new over the counter crap containing phenylephrine which has now pretty much been shown to be highly ineffective while the Mexican Sinaloa cartel has moved in bringing more violence, twice the supply, and a reduced street price.

          Ok, my sinuses aside, there are lots and LOTS of policies where it’d be nice to be able to specifically and quantitatively estimate the effects of putting those policies in place, everything from setting the Fed interest rate, to altering the terms of Ag subsidies, to providing tax incentives to upgrade to energy efficient A/C systems, whatever. It’s no good to say “if we provide subsidies, more people will upgrade their A/C than before” because we have to decide whether we should provide an A/C subsidy, or a automobile subsidy and the relative magnitude of the effects on things we care about.

          There are quite a few universal biological/physical factors about humans: women have peak fertility at about age 29-30 pretty much worldwide. Men are x% heavier and x% taller than women on average. Caloric requirements are pretty well predictable based on activity levels. The effect of weather/temperature on people’s activity is predictable. Etc etc etc.

          Without that kind of universality in predictive ability, economic models are always going to be of the kind “we observed some stuff, and saw certain tradeoffs being made in this particular population at this particular time with this particular set of global geo-political issues going on”

          which is a good thing to do, but it will always always always lead to the question: “can we use that information to predict anything about this different set of circumstances?”

          Whereas, something with a certain amount of universality gives you predictive power out-of-sample: put a person in Antarctica, with a known wind speed, wearing a certain set of clothing, doing a certain amount of physical exertion, eating a certain amount of food, and I’ll predict something about their body temperature even if I only have data from people in 0F and 10F environments at lower wind speeds, because I can figure out information about how fast people metabolize and produce heat, how fast that heat conducts through the clothes, etc.

          So I think I disagree when you say we shouldn’t WANT universality. Some things are probably “kind of” universal, and we should try to use that information!

          That all being said, I still suspect that for most economic issues, the best we’re going to get is “we can’t go very far from the original conditions and still expect good predictions”

        • Suppose, for example, that in all human civilizations prior to the advent of the internal combustion engine the (mean) resources allocated per day to transportation as a fraction of the resources necessary to travel to the nearest food market and back home were between 1/3 and 1/5 as much as the resources allocated to food as a fraction of the resources required to feed a person a full set of healthy meals per day.

          would economists even have detected such a thing? Is there a fundamental universality that might explain such a thing such as the physical mass of the food required, the forces on transportation equipment, the speeds at which such equipment could travel… ?

          It’s more or less a tangy taste of how you might think about and try to identify issues of universality… I don’t know, I could be way off base, and I’m sure as people get wealthier the limitations of “universal” issues change (for a couple hundred grand you can orbit the earth with Russians for a weekend right?)

        • Sorry, I think I meant to say 3 to 5x not 1/3 to 1/5 based on the above wording. Basically, perhaps people tend to travel to buy 3 to 5 days worth of food at a time… and that is reflected in their choice of distance from markets, and amount of capital invested in transport technology… whatever. This is all a made up example to give the flavor not some kind of actual hypothesis.

      • “Also, here’s a very basic question: At the heart of “external validity” is basically “causality”. The external validity of Newton’s equations is extremely high (near lightspeed notwithstanding) and that’s because we have real causal fundamental models of the forces.”

        I’m not sure this is true. Is causation even a concept in Newtonian physics? And in what way are those equations fundamental? They make no reference to underlying mechanisms.

        Sure, they hold true under a remarkably large range of scope conditions, but this seems to have nothing to do with causation or fundamentals.

        • Newtonian mechanics more or less says “model the second derivative of position”. And Newton was able to write the model in the asymptotic case where the only effect of interest was “gravity”.

          But, in working through a lot of physics since then, we now know there are atoms made of nuclei and electrons bound together by coulomb forces, and there are magnetic forces due to currents of electricity etc. Pretty much EVERYTHING that can be explained by Newton’s mechanics can be explained as a consequence of these fundamental forces. So, friction is all about the places where two surfaces get close enough to have coulombic interactions. Fluid mechanics of gasses is pretty much about the interaction of individual atoms colliding with each other or the atoms that make up things… and the collisions are pretty much the net effect of an enormous number of coulombic type interactions:

          So, we have at this fundamental level an explanation for how Newtonian forces arise, it’s basically maxwell’s equations + gravity and there’s nothing else unless you get outside the realm of what Newton’s laws are supposed to do (ie. you want to predict what’s going on with individual electrons, atoms, whatnot or at near lightspeed). Sure we write down all kinds of “force laws” like frictional forces, and drag forces, and spring forces, but we understand them to be approximations to the net effect of a crapload of coulombic forces plus gravity.

          This universality means we don’t need to worry that suddenly some objects are going to start repelling each other with an equilibrium distance between gravitational attraction and anti-grav repulsion at ~ 1m but only when painted with a special paint.

          or some such thing.

          that’s what I mean about the fundamental causal model.

  2. It’s great to see this sort of work being done. The reweighting that Bisbee, Pop-Eleches and Samii use (which they credit to Abadie, 03 but has its roots in Hirano, Imbens and Rider, 00) use could easily be adopted by empirical researchers in general.

    It seems to me that empirical economics would also benefit from a more structured approach to aggregating research results. I’d be interested in seeing meta-analyses on research with natural experiments. Surely some well-researched topics (Frisch elasticity, MPC, ..?) have enough well-done studies to merit this.

    • So what happens if you use not just the between-study variation in treatment effect, but the within study heterogeneity too? So replace GDP with a household wealth measure, estimate treatment effects across wealth within each experiment, and check the out-of-sample prediction power of the interactions.

      Does the within-sample heterogeneity in treatment effect predict the between-sample, to an order of magnitude? Can using both be helpful (so that both individual and community characteristics help predict treatment effectiveness)?

      I’m thinking a field like education (class size, peer quality) or health (drug effectiveness, sanitation), where there are lots of studies that estimate similar treatments interacted with similar covariates and done in lots of different places at different times. It could help us understand how much we do/don’t want to believe coefficient estimates on T*wealth or T*female in the next experimental seminar we attend.

  3. There is also some very general work out there using Causal Graphs, e.g. Pearl, Judea, and Elias Bareinboim. “External validity: From do-calculus to transportability across populations.” Statistical Science 2014.

    From just skimming them, the papers Andrew pointed to do not generate any new theoretical insights, but show successful empirical applications using parametric methods. They should be special cases of the Pearl/Bareinboim theory.

    @Daniel Lakeland: The questions of causality and external validity is “solved” for a pretty wide class of problems (see the work of Pearl). But of course it can only be solved be either running an experiment in the population of interest or making qualitative judgments about what effects can be assumed to be zero in the population, or how the experimental sample relates to the population of interest. The Pearl/Bareinboim paper puts it like this: “To be of immediate use, our method relies on the
    assumption that the analyst is in possession of sufficient background knowledge to determine, at least qualitatively, where two populations may differ from one another. This knowledge is not vastly different
    from that required in any principled approach to causation in observational studies, since judgement
    about possible effects of omitted factors is crucial in any such analysis. Whereas such knowledge may
    only be partially available, the analysis presented in this paper is nevertheless essential for understand-
    ing what knowledge is needed for the task to succeed and how sensitive conclusions are to knowledge that
    we do not possess.”

    • Julian:

      We discussed that paper, or some version of it, a couple of years ago on the blog. In particular, I suggested Pearl and Bareinboim integrate hierarchical modeling into their framework. With hierarchical modeling, I think they’ll be able to apply their method to many situations in which different groups or scenarios are related but not identical.

    • As Andrew pointed out there was a lot of discussion on this aspect before.

      Confusion seems to arise between 1. if you assume you know the causal structure (or some part of it) methods are valuable to correctly map out the implications of this in given study (logic of deduction or as Ramsay coined it logic of consistency) 2. when trying to learn about causal structure how do you _best_ discern how you are wrong about representing the _true_ causal structure and get less wrong (logic of induction or as Ramsay coined it logic of discovery).

      Julian’s comment “But of course it can only be solved be either running an experiment in the population of interest…” did cover that aspect.

    • You might be able to argue that the *meaning* of causality has been “solved” (ie. carefully defined) for this wide class of problems, but by NO means have actual causal models been constructed that properly explain very much at all in economics or any social science. For example, what is the causal model that explains (even in an approximate statistical way) say the spot price of a barrel of crude oil, or the labor force participation rate in the US?

      • Sure, that’s exactly what I meant. But of course you need a general definition and understanding of how to solve these complex problems before you can even try to solve any specific of them. Turns out that causality and external validity for problems with observational data had not received precise treatment until very recently, so the verdict on lots (all) social science with respect to that is still open.

Leave a Reply

Your email address will not be published. Required fields are marked *