The quote is from George Box, 1979.

And this:

Please can Data Analysts get themselves together again and become whole Statisticians before it is too late? Before they, their employers, and their clients forget the other equally important parts of the job statisticians should be doing, such as designing investigations and building models?

I actually think the current term “data scientist” is an improvement over “data analyst” because the scientist can be involved in data collection and decision making, not just analysis.

Box also wrote:

It is widely recognized that the advancement of learning does not proceed by conjecture alone, nor by observation alone, but by an iteration involving both. Certainly, scientific investigation proceeds by such iteration. Examination of empirical data inspires a tentative explanation which, when further exposed to reality, may lead to its modification. . . .

Now, since scientific advance, to which all statisticians must accommodate, takes place by the alternation of two different kinds of reasoning, we would expect also that two different kinds of inferential process would be re- quired to put it into effect.

The first, used in estimating parameters from data conditional on the truth of some tentative model, is appropriately called

Estimation. The second, used in checking whether, in the light of the data, any model of the kind proposed is plausible, has been aptly named by Cuthbert DanielCriticism.

Box continued:

While estimation should, I believe, employ Bayes’ Theorem, or (for the fainthearted) likelihood, criticism needs a different approach. In practice, it is often best done in a rather informal way by examination of residuals or other suitable functions of the data. However, when it is done formally, using tests of goodness of fit, it must, I think, employ sampling theory for its justification.

He was writing in 1978, back before people realized the ways in which model criticism, exploratory data analysis, and sampling theory could be incorporated into Bayesian data analysis (see chapters 6-8 of BDA3 for a review).

Goodness of fit can be a poor indicator of model appropriateness (Spanos’ hobby horse is Ptolemy versus Kepler planetary motion, of which the Ptolemy model fits better (R squared) but the residuals are “erratic”–this would seem to indicate that residual analysis is superior in some cases (unless one believes in Ptolemy motion)).

I’d not encountered this Kepler vs Ptolemy example before. It’s a great teaching example showing yet again the EDA principle of drawing the damn data instead of getting excited about long formulas.

The basic notion of Spanos’s misspecification testing (as interpreted by me: a model is misspecified if there are no parameter values for which the data lie in the typical set) seems unobjectionable. That said, I have a real big problem with the means by which Spanos has attempted to use this particular example to lend weight to his approach.

Check out Cosma Shalizi’s class slides on the subject, starting with slide 19. These slides summarize the main points of Spanos 2007 while skipping over the details of the model.

In brief, Spanos shows that the residuals of the Keplerian model fit to Kepler’s original n = 28 data set are indistinguishable from white noise, while the residuals of the Ptolemaic model fit to a data set of one Martian year (~2 Earth years) of *daily observations from the US Navy Observatory* (n = 687) show unmistakable autocorrelation. I don’t mind telling you that my jaw literally dropped when I realized that Spanos was checking the statistical adequacy of the two models on *two different data sets*.

The obvious question that leaps to mind is: is the Keplerian model adequate for the modern n – 687 data set? If we’re to conclude that the Spanos’s misspecification approach is of value, it seems critical to show that it can distinguish statistical adequate models from statistically inadequate models *when those models are fit to the same data sets*! It also seems important to ask if the Ptolemaic model is inadequate for Kepler’s data — the story would not be nearly so stark if it should turn out that Spanos’s approach judges *both models* adequate when fit to the Kepler data set and inadequate when fit to the US Navy Observatory data set.

So where are the missing misspecification tests? I judge it a real and drastic failure of peer review that Spanos 2007 was published (even in a philosophy journal) without them.

Yes, goodness of fit must be balanced against model complexity. See, for example, http://en.wikipedia.org/wiki/Minimum_message_length

Actually, that’s exactly the sort of thing Spanos rails against. One of his main claims is that model selection (including techniques that balance fit and model complexity like MML), provides only a comparison within a set of models and therefore fails to account for the possibility that *none* of the models is “statistically adequate” in Spanos’s sense.

Indeed, physics, chemistry and increasingly biology are built upon the principle of parsimony, and this is something that need be made much more explicit in teaching science at all levels

The trope that anyone who is a “data scientist” or “data analyst” must needs become a Statistician is naive to the core. The real problem here is that the statistics community has not figured out how to teach people in other domains the core things they need to know to do valid research. What they need is the core statistical toolbelt to know how to do reasonable things all the time, to recognize when they’re going down a terrible path along the things you complain about on here all the time, and to know what to look up and where to look it up when they need to.

We don’t say that anyone doing science that involves writing computer code must become a Computer Scientist. We don’t say that anyone who uses calculus or DEs in their research must become a Mathematician or Applied Mathematician. The reason is that these fields have figured out how to build abstractions and teach students useful things in a short period of time, say a semester. (Math has the obvious advantage of being pushed down to kindergarten, obviously.)

Despite having taken stats at four academic levels in my career, I’ve never seen a stats class has come close to this goal. Whereas when I look at some Data Science courses, like (admittedly-biased) my boss Bill Howe’s just-started Coursera course https://www.coursera.org/course/datasci — maybe we’re getting closer!

I’m inclined to agree with your criticisms of statistics teaching, but I have to point out that your analogies seem off: they imply correspondences like

data scientist : statistician :: code-writing scientist : computer scientist

data scientist : statistician :: differential-equation-using scientist : applied mathematician

These correspondences would tend to suggest that the job descriptions of working “data scientists” and working statisticians would not overlap very much. Is this really true? How do you see the distinction between data science on the one hand and statistics (in the mode of Box) on the other?

Oh, “code-writing scientist” and “DE-using scientist” ARE “data scientists”… assuming, that is, that they are using data in their science.

That definition of “data scientist” would seem to rob the phrase of all meaning beyond what it already implied by “scientist” — and indeed, the term has seen a fair bit of watering down.

Still, in terms of the sorts of jobs get described as “data scientist”, here are the fruits of a quick Google search:

I wrote a reply, but it had a lot of links and has apparently been eaten by the spam filter…? Hopefully AG will resurrect it at some point.

The analogies seem appropriate to me. Many data scientists (especially the code-writing and/or DE-using sorts) _are_ primarily trained in CS or App Maths (or Engineering), just like many are primarily trained in Stats. The point is that, since the day-to-day activities of data scientists combine skills from all of these disciplines, there is no reason to suppose that Stats should be priviledged over the others in the primary training of a data scientist.

+1

I’m not sure what the solution is but you are describing a very real problem.

> the core things they need to know to do valid research.

> What they need is the core statistical toolbelt to know how to do reasonable things all the time

I don’t know these, maybe Andrew does, more likely Box did, but these are inferences that are far from known to be valid.

@Daniel Halperin: Given your goals, you probably wanted an applied regression class. Mitzi (my wife) took one at NYU that she really liked. And I bet Jennifer’s teaching out of her and Andrew’s book is great (also applied regression).

I don’t think anyone has figured out how to teach someone even basic differential equations in a semester, unless of course they already have three semesters of calculus! And even then, you hardly develop a working knowledge of diff eqs in a semester. At least I didn’t — it took applications to put everything into perpsective, and analysis until I actually understood what was going on at any level of generality. And I still don’t really understand the multivariate cases or PDEs very well at all.

Much of the problem with stats education is that it’s like we’re trying to teach people differential equations without teaching them calc first. I get the motivation — everyone’s impatient and math is hard, and unlike diff eqs, lots of people seem to think they need a superficial intro to stats. To use another analogy, it’s like we’re trying to teach students template metaprogramming in C++ before we teach them what a compiler does or what a for-loop is.

I really wish someone would give me a “core statistical toolbelt” that would show me how to do “reasonable things all the time”. I’ll settle for a core calculus or prorgamming toolbelt if the statistical ones are all spoken for.

+1

These discussions of data scientists inevitably lead me to wonder what other kinds of scientists there are. What’s a non-data-based scientist? A mathematician?

I have the same question about evidence-based medicine. What other kind is there? Witch doctoring?

I would say the main “other” kind of medicine is maybe “model based” medicine. This is the medicine where people take some basic idea about biology, make some conclusion based on it, and then treat diseases based on that conclusion without regard to whether the original model was adequate.

Examples include things like “kidney stones are calcium based, so if we reduce the calcium in people’s diets they will get fewer kidney stones” (actual evidence says that increasing the calcium in your diet binds oxalate in the food and prevents it from circulating in the bloodstream hence reducing kidney stones).

Such a LARGE body of medicine is basically this kind of witch doctoring that people invented a new term “evidence based medicine” to contrast it.

Essentially “evidence based medicine” is what Andrew would call “model checking”

+1

Evidence-based medicine is a good idea to bring up when faced with 1) someone who is trying to tell you that vaccines are bad because they contain chemicals and cause autism (new-age-based medicine?); or 2) someone who is trying to tell you that they don’t believe in illness and if they just pray hard enough their cancer/bad back/kidney stones will go away (christian-science-based medicine); or 3) someone who thinks the way to reduce unwanted pregnancies is by telling teenagers that sex is evil (mainstream evangelical christian medicine?); or 4) someone who is trying to tell you that if they just drink enough wheatgrass/kale juice their cancer/bad back/kidney stones will go away (wtf-based medicine?) … We have come up with lots of alternatives to evidence-based medicine… and I imagine a witch doctor might be more useful than some of these.

Bob: “I have the same question about evidence-based medicine. What other kind is there? Witch doctoring?”

The other kind of medicine is the one we have: a mixture of science, ego, career ambitions, do gooders, and profits.

Science is a profoundly human endeavor.

PS If I had to put a number I’d say 10-30% of modern medicine is no different to witch doctoring.

PPS I’m not a vacciu denier (got one yesterday!) just cognizant that science is not made by dispassionate machines.

That was me in previous comment. Forgot to add name.

PPPS. In the case of “nutrition science” I’d say 50-100% is mindless regression-driven witch doctoring.

+1

You’d be surprised. But just for fun: http://home.comcast.net/~jasoncillo/Alternative%20EBM.pdf

I like this quote:

“Statistical practitioners have known

for a long time that, prior to using the methods that most

textbooks emphasize, there is a very important and

largely neglected’ phase of activity which Fisher called

specification and which has also been called model

identification. This involves informal techniques of

analysis of data, many of them graphical, aimed at

looking at the data in a preliminary and exploratory way

in order to help understand what questions should be

asked and what tentative models might be entertained.

Until recent years, however, this process was regarded by

the majority as not entirely respectable. Like the black

art, it was widely felt that it should be conducted, if at

all, only behind closed doors.”

A similar quote from William Deming. His emphasis on order of appearance is important:

“It is important to remember that the mean, the

variance, thc standard error, likcelihood, and many

other functions of a set of numbers, are symmetric.

Interchange of any two observations x, and xj leaves

unchanged the mean, the variance, and cven the distribution

itself. Obviously, then, use of variance and

elaborate methods of estimation buries the information

contained in the order of appearance in the original

data, and must therefore be presumed ineficient until

cleared.

Pencil and paper for construction of distributions,

scatter diagrams, and run charts to compare small

groups and to detect trends, arc more efficient mcthods

of cst,imation than statistical infcrcncc that dcponds on

varianccs and standard errors, as thc simple techniques

preserve tho information in thc original data. In

fortunate circumstances (normal estimates, independence,

absehnce of pattebrns), and when the whole

study went off as intcndcd, one may indeed summarize

the results of comparisons as confidence intervals or

fiducial intervals, making use of standard errors. But

thcscb circumstanccs require demonstration by simple

methods of pebncil and paper [I], [7], [21].”

https://www.deming.org/media/pdf/145.pdf

There really isn’t much excuse for not sharing the raw data along with the publication these days, I wonder when that will end. Mean + error bars is not sufficient to understand what questions should be asked of the data, this type of presentation really limits the usfulness of a research report. The data is more important than the researcher’s opinions about how to analyze and interpret it. In many cases I don’t think we even know what it is that needs to be modelled in (what Box calls) “contaminated” data sets.

From my own work with rodent behaviour I have seen many indications that the data from each rat is not independent. For example many rats perform very well one day or very poorly the next, or the rats are studied in different “cohorts” (eg 12 at a time) and a treatment effect appears one cohort but not the next. Whatever causes this is just as strong an effect as any experimental manipulation but we have no idea what it is. For 30 years people have been running similar experiments without noting this, I suspect they just never looked at the individual data (or perhaps some did but thought it was “messy” and thus embarassing). Instead the reports are average data each week for each animal, then average those again for each group of animals ignoring cohort. The shape of the group average curves don’t look like the individual average curves which don’t look like the plots of the data points.

There have been previous reports of this type of stuff going on (effects of cage position) that have also gone ignored:

“Unexpected findings in a mouse study in which the safety of FD&C No. 40 (Red 40) was examined led to additional experimentation and to new statistical analyses and models. The possibility of acceleration of tumors raised questions about an operational definition of acceleration and of appropriate statistical methods for assessing acceleration, especially in the face of data dredging. The evaluation of Red 40 was further complicated by cage and litter effects and the multigenerational design. In this report the investigations of these studies are reviewed and are used to illustrate how new scientific work can emerge through the regulatory process. A number of issues in animal experimentation that need to be examind are indicated.”

http://www.ncbi.nlm.nih.gov/pubmed/6935460

Unfortunately there is little interest in such matters amongst biomed researchers since this is all “noise” that averages out.

I’d love to be able to make experiments with rats. The furthest I ever went was experimenting with clams while preparing a paella.

I put the live clams in individual glasses with some salt water. Then disturbed them, and measured time between the clam closing and opening again. I wanted to know whether some were more shy than others.

I think I added them to the paella before the experiment was over….

Fernando,

It is interesting (even fun) work, until the end… I think far too much info is wasted (either not collected and/or not reported) from these animals we are killing for research purposes. My experience has changed my mind regarding animal research. While I do think it can be informative and useful enough to be worth it, the way the data is analyzed and interpreted does not make this true in practice. The vast majority of these animals die for nothing beyond advancing someones career.

question:

Just to be clear, I was not planning to kill rats etc.. (though I did kill the clams by cooking them alive in the paella but somehow that is ok to non-vegetarians)

As a social scientist I am more interested in testing behavioural theories e.g. http://www.jstor.org/discover/10.2307/2936137?uid=3739256&uid=2&uid=4&sid=21103926360771

But I agree with you on the important ethical responsibility in carrying out experiments w all animals, including humans. This extends to ensuring adequate power, having clear protocols, and well grounded analyses.

Fernando,

I didn’t mean to suggest you were going to go out and kill rats. At least in the US, even if you did only behavioral (no histology/etc) work with rats, you would need to pay for the housing in a certified facility which costs 1-3 $/day per cage. Space is also often limited, so usually there is someone there who “sacrifices” them when the study is complete.

+1

Andrew,

“The great advantage of the model-based over the ad hoc

approach, it seems to me, is that at any given time we know

what we are doing” (George Box, 1979)

Thanks for this great quote, which I intend to use in discussions

with those who think that “models” mean models of data

as opposed to models of the “data-generating process”

Box wrote it in 1979, before people realized that the

models that tell us “what we are doing” or what data to

collect are not really models of the data distribution but of the

“data-generation process” . The latter, unfortunately,

has not been given face in statistics textbooks or

mainstream statistics literature.

1979 was also before people realized that the information

we need to extract from the “data-generation process” will forever

remain beyond the scrutiny of data, regardless of

sample size, and this includes the “examination of

residuals” or any form of sampling theory, however

sophisticated.

Spanos’ papers attest to the enduring illusion

among (some) economists that a methodology based on

“data first, theory second, if at all” can replace

model-based approaches.

When I last visited Columbia University, I noticed

that the institute in which I gave my lecture

was named “Institute for Data Sciences and Engineering.”

Naturally, I asked my hosts: “Do we need a science of data,

or a science of reality that produces data?”

My concern was (and still is) that, by prioritizing data, the data-generating

process will remain face-less in textbooks, research, and blog discussions.

Corey,

Here is a summary of my readings of Spanos’ recent papers:

(quoted from http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf)

“The idea that an economic model must contain extra-statistical

information, that is, information that can not be derived from

joint densities, and that the gap between the two can never be bridged,

seems to be very slow in penetrating

the mind set of mainstream economists. Hendry, for example, wrote:

“The joint density is the basis: SEMs

are merely an interpretation of that” (Hendry, 1998, personal

communication). Spanos (2010), expressing similar sentiments,

hopes to “bridge the gap between theory and data” through

the teachings of Fisher, Neyman and Pearson, disregarding the

fact that the gap between

data and theory is fundamentally unbridgeable. This “data-first”

school of economic research continues to pursue such hopes, unable to

internalize the hard fact that statistics, however refined, cannot

provide the causal information that economic models must encode

to be of use to policy making.”

But I am open to change my mind upon evidence that

Spanos acknowledges that economic models can never

be derived from joint densities.

The “data-first” school of research is not limited to economics. It is certainly an issue also in genomics, and it is addressed for example in Nature by the exchange on hypotheses vs data first between Robert Weinberg and Todd Golub.

+1. I die a little inside everyone someone says they’d like to see an “unbiased analysis” in discussions of genomics.

I just read the Weinberg and Golub pieces. Both seem to acknowledge that the other type of research is also worthwhile. But there seems to be a competition for funding involved.

So for you, if I understand you correctly, the slogan “data first, theory second (if at all)” refers to an attitude that fails to understand that causal information cannot be derived from a joint pdf alone. I don’t know where Spanos stands on that question. The text of yours that I quoted can be read much more broadly than this (a reading that I now see you did not intend) and Spanos cannot be indicted under that broad reading.

it’s surprising how widespread this misconception is.

I was discussing causal graphs with a _statistician_ and he said, “but these aren’t useful because we usually want to figure out what the causal structure is”, as if that could be determined from the application of a hypothesis test absent causal assumptions.

Anonymous,

I am surprised that a statistician said “we usually want”? When was the last time that you saw

a statistician wanting to figure out what the causal structure is? Take for example this

blog, which is quite popular among statisticians. Has anyone ever asked about the tools available

for structure learning? (I recall only one occassion, when Andrew Gelman reviewed Sloman’s book,

with obvious dismissal). Other than this occasion, statisticians do not grant reality any representation,

let alone “causal structure.” Examine one by one all statistics textbooks written from 1900 till

2014, graduate and undergraduate, mathematical and popular. I would be thrilled to find one in which reality (as opposed to distributions) is given a face. It has been a total and consistent denial.

Judea:

I’m getting tired of this sort of rhetoric. You write that, other than in a review essay that I published in a sociology journal, “statisticians do not grant reality any representation . . . Examine one by one all statistics textbooks written from 1900 till 2014, graduate and undergraduate, mathematical and popular. I would be thrilled to find one in which reality (as opposed to distributions) is given a face . . .”

In my two statistics textbooks, which were indeed written between 1900 and 2014, we have examples from education research (the effects of coaching on college admissions tests and the effects of a TV program on kids’ learning), public health (decision analysis for home radon measurement and remediation), political science (incumbency advantage in elections), toxicology, psychometrics, public opinion, criminology, experimental economics, and a zillion others. To say that we “do not grant reality any representation” is just insulting. There are many different ways to do science, and many ways to do statistics. Just because we don’t happen to do things your way, that doesn’t mean we’re not studying “reality.” I would appreciate a bit of pluralism, but if you don’t have that in you, I’d appreciate a bit of politeness and a bit of respect to people who spend their careers studying reality and just happen not to use your favored methods.

Andrew,

This is not rhetoric but an astute observation that I am prepared to defend.

First, note that I did not mention “my representation” even once, I asked for “any representation”.

Second, I would be thrilled to hear what representations you used in the many scientific areas that you listed. Mathematical representations have names, e.g., algebraic equations, regression models, programs, structural equations, distributions, graphs, logical statements, differential equations, counterfactual statements, etc.

Each of these has capabilities and limitations, and each has a set of questions it can answer and a set it

cannot answer.

No need to get insulted; if you can name the representations that you used to describe reality, I would be the first to take back my observation if it turns out to be wrong, or if it calls attention to a defficiency that does not exist.

Try me.

Just one last reminder, we are looking for a representation of reality (the data-generating process), not of the analyst’s routines, nor of the data.

Judea:

You call it an “astute observation” that “statisticians do not grant reality any representation . . . Examine one by one all statistics textbooks written from 1900 till 2014, graduate and undergraduate, mathematical and popular. I would be thrilled to find one in which reality (as opposed to distributions) is given a face . . .”

My books are full of analyses of real problems. My colleagues and I are representing reality all the time.

@judea pearl @Andrew:

Is the problem just that the two of you have a different idea about what the word “representation” means? Using ordinary words like “cause” and “represent” as technical terms is fraught with peril due to the likelihood of being misunderstood.

If Andrew’s and Jennifer’s book on regression, including numerous chapters on causality, doesn’t qualify as presenting a representation, then I think everyone can agree to differ on what the right definition of “representation” is.

Clearly, Andrew and Jennifer are thinking about causality at least insofar as they put the word in their chapter titles. And they’re clearly giving priority to “reality” (as opposed to making things up), unless that has a narrow philosophical interpretation that needs to be clarified.

As another example to help refine terminology, are Newton’s differential equations a representation of motion in general? Or of our solar system in particular when you plug in the positions, masses, and velocities of the sun, planets, and moons? Were I to add a measurement error model for the human-operated telescope I use to measure those positions, is that a representation of reality (the reality of the solar system and the reality of the measurement instrument in particular)? Again, if not, then we will just have to differ on what “reality” means.

P.S. I’m completely ignorant as to whether Newton was thinking about “causation” or “reality” in the sense that Judea Pearl intends. I also have no idea what contemporary physics makes of causality, determinism, intrinsic randomness vs. unexplained variation, etc.

Judea: Are you asking for examples of simulations of events ala weather forecasts? Or of first-principle equations ala physics? If you could give three or four concrete examples — from whatever fields you think applicable — of what you’re asking for, it’d be much more useful. I’m not sure I’d recognize what you’re asking for if I personally did search all statistical textbooks.

I do see a lot of non-statisticians who think they’re being all statistician-like by treating their data as an abstract quantity that can be manipulated with generic statistical procedures. There’s no basis in an underlying reality and no concern for a generating process, it’s just generic numbers through generic procedures to get acceptable results that are then interpreted using domain-knowledge words. Is that what you’re objecting to?

Andrew: I don’t think Judea is saying that you don’t address real-world problems, nor even that you don’t address such problems rigorously and with subject matter expertise in mind. I think he’s saying something about whether your statistical models mirror the “physics” of what you’re modeling. Though perhaps that’s just me being charitable… depends on what examples he can muster.

Andrew, Bob, Wayne, et al,

My observation that “statisticians do not grant reality any representation . .” requires explanation on my part, and requires serious attention by statisticians.

By “representation” we mean a mathematical object from

which one can deduce answers to a set of questions.

By “representation of reality” we mean an object from

which we can deduce answers to a broad set of questions, as if we had access to reality itself.

A joint density, for example, is a representation that

can answer questions of predictions and retrodiction

(e.g., find the likelihood of a disease given a symptom) but does not answer many questions that we could answer if we had access to reality itself, for example, the likelihood of a disease given that

we cure the symptom.

The answer to the last question can be obtained for example from a program that simulates the relationship between symptoms and diseases, the kind of program statisticians use for sensitivity analysis. It can also be gotten from other representations.

I go back now to my observation that “statisticians do not grant reality any representation ” and I would like

Andrew to identify by name the kind of mathematical

object he used in his textbooks that allows us to

encode the sentence “symptoms do not cause diseases”,

or, if you do not like causation, identify the representation from which an analyst can predict that curing the symptom will not cure the disease.

Such representations do exist, they are sometimes

taught in the social sciences and econometrics, but

they do not appear in any of the 26 books

that are now smiling at me from my “statistics textbooks” shelf. Please check your shelf and let me

know if you find any.

And please urge Andrew to tell us what representation

his book recommends for a modeller who wishes to express an innocent piece of knowledge: “Aspirin

might lower your fever, but will not cure your flue”.

The distinction I make is worth looking into.

Judea:

I know two statistical textbooks (“Longitudinal Data Analysis” by Fitzmaurice et al and “Statistical Methods for Epidemiology” by Jewell) that have chapters with the representation you are referring to.

What about you (and other causal inference researchers) contributing a chapter in commonly used statistical textbooks?

Judea, be thrilled: Gelman and Hill 2007, Chapter 9, is all about causal inference in the Neyman-Rubin potential outcomes framework; which framework you have proven to be isomorphic to causal graphs (if rather less transparent).

@Andrew:

I suspect that in Judea’s worldview anything other than using a DAG does not

“grant reality any representation”?Is my characterization right Judea? What approaches do you accept under your rubric of granting reality a representation?

Judea:

You write, “By ‘representation’ we mean a mathematical object from which one can deduce answers to a set of questions. By ‘representation of reality’ we mean an object from which we can deduce answers to a broad set of questions, as if we had access to reality itself.”

I’m not quite sure what you mean by “reality itself” but we have many many examples in our books in which we model underlying processes. For example, we estimate what would happen in the human body if a person is given some exposure to perchloroethylene—this is what Bois, Jiang, and I were doing in our 1996 paper and the example is discussed in BDA. For another example, we estimate the costs in dollars and lives of various measurement and remediation strategies for home radon—this is another example that has appeared in various publications including BDA. In both of these examples and in many others, we set up and estimate a model that allows us to talk about what might happen under various input conditions. Indeed, this sort of model is standard in the physical sciences. Before I ever studied statistics, I was programming finite-element models to simulate temperature patterns in some plastic sheets that were being sent into space, under different shielding conditions. That example does not happen to be in BDA, but some version of it could’ve been.

Again, for you to speak of all this work, that happens not to use your favored methods, as having “been a total and consistent denial” of reality, is non-pluralistic, impolite, and indeed silly.

CK etal,

I do not have access to Fitzmaurice et al,

but looking at the TOC and Index on amazon,

I doubt this text is likely to change statistics’

avoidance of reality.

(80% of the index is on regression, covariance,

likelihood, prediction and other terms of descriptive models)

As to Nick Jewell, he interviewed me last week and

did not voice any reservation to my claim

that mainstream statistics has denied face to reality.

(The interview will be posted (by Wiley) in JSM.)

Nevertheless, epidemiology is the most progressive

among all statistics’ satellites (see the books

by Greenland etal, Van-der Laan and Rose, Hernan and

Robins, VanderWeele and more) and

statistics proper will eventually follow suit,

read my lips.

Corey etal,

The potential outcomes framework is an important

step towards representing reality’s behavior

under interventions. It allows us to state mathematically (albeit clumsily) that taking an aspirin will cure the symptom but not the disease. It is a great leap forward, but, unfortunately, stating such assertions is not the same as deducing them from some meaningful representation of reality.

The frameword requires (essentially) that we state explicitly all the answers to the questions we intend to ask, rather than derive those answers, upon demand, from a compact set of meaninful assumptions about reality.

Still, Gelman and Hill should be commended for devoting

a whole chapter to this framework. I hope their

next edition will include a chapter on how potential

outcomes can be derived from transparent epresentations

of reality.

Rahul,

No, your characterization of me is wrong.

As a computer scientist, my preference

of representations is not a matter of taste,

it is a matter of technical evaluation of merits.

I would accept any mathematical representation that

will allow us to encode what we believe about reality

and then use that encoding to answer questions of three types:

“What if we do XYZ?” ” What if we did things differently” “Why did XYZ occur”.

You can argue that demanding answers to all three types

of questions is asking too much, and that having a joint

distribution is good enough for representing reality.

I would respect that. But if you agree with me that

modeling reality entails the ability to answer these three questions,then, I would like to hear your suggestion of how to do it.

Please try it on a simple problem and you will appreciate how broad minded, pluralistic and accommodating my worldview is.

Seriously, have you tried it?

Andrew, etal,

I have explained several times what I mean by “modeling

reality” — using a mathematical representation that

allows us to encode what we believe about reality

so that we can later answer questions of the type:

“What if we do XYZ?” ” What if we did things differently” “why did XYZ occur”.

If you have such a representation, please share with us

its name. Does it have a name? Does it have a theory behind it? Literature? Tradition? Can you show us how the three questions above are answered on three binary variables, X,Y,Z ??

Say disease, symptom, and aspirin.

Why is it so hard?

I know that “this sort of model is standard in the physical sciences”, but is it standard in statistics? If it is, then it should have a name (e.g., regression, potential outcome, simulation program, logical statements, structural equations, differential equations, counterfactual logic, finite state machine, etc. –see how pluralistic I am?) and it should

be easy for you to show us (in just three lines) how the three questions above are answered on three binary variables, X,Y,Z ??

Why is it so hard? Why hide the name?

Judea:

You write: “Andrew, et al, I have explained several times what I mean by ‘modeling reality’ — using a mathematical representation that allows us to encode what we believe about reality so that we can later answer questions of the type: ‘What if we do XYZ?’ ‘What if we did things differently’ ‘why did XYZ occur’.”

As I wrote in my comment above, we have many many examples in our books in which we model underlying processes. For example, we estimate what would happen in the human body if a person is given some exposure to perchloroethylene—this is what Bois, Jiang, and I were doing in our 1996 paper and the example is discussed in BDA. For another example, we estimate the costs in dollars and lives of various measurement and remediation strategies for home radon—this is another example that has appeared in various publications including BDA. In both of these examples and in many others, we set up and estimate a model that allows us to talk about what might happen under various input conditions. Indeed, this sort of model is standard in the physical sciences.

Now, if you want to say that the above examples are not “models of reality” because they do not address “why did XYZ occur,” then all I can say is that you have a different definition of “reality” than I do. To me, a statement about what would happen inside the body given different exposures of perchloroethylene is a statement about reality.

Again, for you to speak of all this work, that happens not to use your favored methods, as having “been a total and consistent denial” of reality, is non-pluralistic, impolite, and indeed silly.

Also, no big deal but my book with Jennifer has three chapters on causal inference, not just one.

@judea

I understand where your coming from and I mostly agree with your argument regarding the deficiency of probabilistic representation. But let me unpack this sentence a bit:

‘I know that “this sort of model is standard in the physical sciences”, but is it standard in statistics?’

In the physical sciences, modeling is done by writing down mathematical equations. Even though physical scientists are indeed interested in causal relationships, the distinction between causality vs. correlation in the written equations remains implicit or not well defined (for example in a conservation equation where directionality would have to be considered bidirectional), just as it is in a regression equation.

Thus with regard to representation, the physical sciences are no better than applied statistics in the sense that they don’t have a notation which distinguishes causal from correlative relationships. In practice, physical sciences might do a better job disentangling causality because there’s both a tradition and capacity for less ambiguous study designs than say, epidemiology or psychology, but this is a separate issue from that of representation.

One could argue that in the era of big, observational, datasets, it’s _because_ study designs are becoming increasingly ambiguous that explicit representations are more necessary now than ever, and I would agree with that. However I don’t think the physical sciences ever got the representation right in a way that statistics does not.

Anonymous,

You say: “I dont think the physical sciences ever got the

representation right in a way that statistics does not.”

I agree. However, fellow researchers and educators on this blog

would be thrilled to know that

through a symbiosis of ideas from the physical sciences,

statistics, graphs and counterfactual logic we now have

a representation that does things right. And by “right”

I mean, again, “a representation that permits us to answer the

questions: “What if I do”, What if I did things differently”

and “Why?

Andrew claims (I paraphrase): it is all in my book, so what is

the big deal?

Unfortunately, we are still waiting to hear from Andrew

how we should take three variables (e.g., aspirin,

flue and fever) and answer these three questions, given

observational or experimental data.

And while we are waiting, we should remind ourselves that

part of being a scientist is to recognize when you

have a breakthrough (if you have one), treat it like

a breakthrough when it deserves the title, and show other

researchers that they, too, should treat it as a breakthrough,

whenever they can do things now that they could not do before.

So, while it may all be hidden in Andrew’s book, my duty as

a scientist is to risk being called un-pluralistic, overseller,

narrow-minded, denigrader of other approaches, and worse, and

tell fellow workers: “We do have a breakthrough!”. We can do

things today that statistics could not do yesterday,

for lack of an appropriate language.

So, enjoy, and use for advantage.

Apropos, one thing we can do today which we could not

do yesterday is going from assessing “effect of causes” to

finding “causes of effects” , something that every

science (of reality) strives to achieve. See

http://ftp.cs.ucla.edu/pub/stat_ser/r431.pdf

Enjoy and use, it is easy.

I must end here, because I see that another lively

discussion is going on below, concerning “why dont

we teach this stuff?”. I have a few thoughts to share on this

topic.

Judea;

This is getting exhausting but I’ll try one more time. You write: “Andrew claims (I paraphrase): it is all in my book, so what is the big deal? Unfortunately, we are still waiting to hear from Andrew how we should take three variables (e.g., aspirin, flu and fever) and answer these three questions, given observational or experimental data.”

I never claimed that it is all in my book, or anything like that. My books are little and the world of statistics is large. There are lots of things that are not in my book. As I’ve discussed repeatedly on this blog, I don’t understand the point your methods sometimes, but I recognize that many people find them useful, and I have no problem with people going outside my book to learn about them!

I also wrote nothing about aspirin, flu, and fever. That’s a problem you’re interested in.

What I

didwrite was that I think it’s ridiculous for you to write that “statisticians do not grant reality any representation.” I gave several examples in which my colleagues and I form statistical models about reality, in areas ranging from toxicology to public health to political science.You are evidently interested in a different set of problems than I’m interested in. That’s fine. But to say that my work in toxicology etc. is not “a representation of reality” is just silly.

If you want to say that your methods are great and that they should get more coverage in statistics textbooks, that’s fine, just say that directly rather than saying that the rest of us are denying reality. Reality is a big place.

This quote from Judea Pearl (above) is a keeper:

Exactly! We don’t need data scientists per se, but scientists who can deal with computation, statistics, visualization, etc. If the computational or statistical problem you work on is hard enough (e.g., neural modeling), then you can get an appointment in a statistics department or computer science department. But as techniques become more mainstream, scientists who aren’t specialists in statistics or computer science can use them, and what used to seem like a computer science or stats problem becomes a biology or linguistics or neurology problem.

Agree, Judea put it well in that scientists really care about the “reality that produces data”.

I was trying to point out above, that has to be much harder to teach than math that is _just_ the science of abstractions.

Also not everyone clearly understands that statisticians/scientists work with models of reality rather than _directly_ with reality itself.

@Bob

Actually I did not understand @Judea’s quote. The “science of reality that produces data” is well & strong in the guise of fields like Physics, Chemistry, Biology etc. Isn’t it?

So what exactly is Judea asking for? I don’t see any reason to complain against a “Visualization Institute” or a “Data Analysis Center” or a “Differential Equations Program”

Studying “reality” needs tools.

Andrew: First, when you incorporate sampling theory within Bayesian theory you haven’t ousted sampling theory, or shown Box’s point to be incorrect. I don’t think it does justice to Box to say he was writing “before people realized”; he was drawing out two importantly different activities, and he’d still find them importantly different today. His “estimation” required an exhaustion of possibilities that “criticism” did not. That’s what enabled the latter to arrive at novel hypotheses. Second, I didn’t think you accepted the posterior probability move as Box imagined under estimation. Third, all the components can also be placed under the error statistical umbrella–so what?. The main thing is that Box had 2 distinct activities in mind, and the interesting question is how well those 2 illuminate statistical inference/inquiry.

> Now, since scientific advance, to which all statisticians must accommodate, takes place by the alternation of two different kinds of reasoning, we would expect also that two different kinds of inferential process would be requires to put it into effect. The first, used in estimating parameters from data conditional on the truth of some tentative model, is appropriately called Estimation. The second, used in checking whether, in the light of the data, any model of the kind proposed is plausible, has been aptly named by Cuthbert Daniel Criticism.

A growing use of data is a third case, which is Prediction. When a bank score a client, the model will be wrong. The test may not reject the hypothesis, but it would be because there is not enough data. It is, however, not important. The bank just want a robust and accurate model. Modeling can help proof that, but it is not the main point, so we can use things like boosting which can’t try to understand the underlying model. The predictive power is the main driver of data science, and this very practical case have not been a main concern of statistics, it is not even mentioned by Box.

I feel your pain here. I came into stats saying the same thing to Andrew — that machine learning was all about prediction whereas stats was all about retrospective data analysis. Andrew and co-authors even called their textbook

Bayesian Data Analysisrather thanBayesian Prediction. If you read the start of the first chapter, though, the authors makes it clear that the parameters in a Bayesian model may be used to represent unknown future outcomes, and their estimates are thus predictions.Digging just a little deeper, it’s clear that prediction (in the sense of assigning probabilities to potential future outcomes) was the original motivation of statistics! Bernoulli, Pascal, et al. didn’t want a model of how dice worked in the past, but how they’d work in the future and whether some bets were wise or not. Essentially the stats were there to help support decision making.

Gosset didn’t set out to study statistics so he could have a nice model of the beer Guinness brewed in the past, but so he could decide how to make beer better in the future! Ditto Fisher and agriculture.

Statistics has also been widely used for data analysis to answer scientific questions, like Laplace’s analysis of whether there were more male than female births. These kinds of questions, which is what you mainly see in an intro stats text, are not intrinsically predictive in the way they are presented (though the estimates on which they are based could be used predictively). But if you test the hypothesis that fertilizer 1 is better than fertilizer 2, then the result will help you decide which fertilizer to use.

I think this intense focus on retrospective hypothesis testing in intro stats books leads people to believe that’s all stats does. That’s certainly how I felt after reading some introductions from the statistics perspective.

I could reword Box’s point into speculate, check, interpret (given the speculated and checked), (re-)speculate (given interpreted),…

(I recall another paper by Box with this or something close).

The checks have to allow for any surprise by brute force reality so can never be fully specified but must remain open ended.

(Otherwise, you would have to have a correct model for how the model you used in estimation could be incorrect.)

You may be thinking of Box’s 1980 paper that covers similar terrain: “Sampling and Bayes’ Inference in Scientific Modelling and Robustness”.

It has a provocative paragraph on page 391 discussing model checking as if it were defense against aerial assault.

Richard:

Thanks for posting, I actually was drawing on Box’ 1976 paper but like the 1980 one even more.

(And it had a reference to K Popper that provide an answer to my curiosity as to where Box got the scientfic learning as an iterative process from. K Popper, in large part, got it from CS Peirce)

I’m not sure you can take Box’s reference to Popper as evidence that “that’s where he got the scientific learning as an iterative process from.” I remember that when I was in elementary school (early or mid fifties), our science textbook described science as an iterative process. (It didn’t use the word “iterative,” but showed a diagram circling through what I would call an iterative process.) So “scientific learning as an iterative process” was common culture by then.

I did say curiosity rather than scholarship, but I can see how my comment could seem naive.

Box did seem to have a deeper grasp than an elementary school teacher though ;-)

Additionally in the 1976 paper he illustrates an aspect of scientific method, “In particular, its representation as a motivated iteration” using the development of statistical methods at Rothamsted Experimental Station by RA Fisher.

Some of us think Fisher got this insight from CS Peirce, but there is no record (or at least, as I last heard, S. Stigler is not aware of any).

Just read this quote and smiled

“A data scientist is a statistician who is useful.” –

Hadley Wickham

from the London report on statistics and science that is making the rounds, see http://www.worldofstatistics.org/wos/pdfs/Statistics&Science-TheLondonWorkshopReport.pdf

I seem to have stirred up a hornet’s nest here (though I suspect that it would have happened without my initial post). I just had a few comments–first, I just glanced at the Spanos Ptolemy/Kepler paper (actually, read the graphs). The fact that this example wasn’t drawn from the same data set was something I hadn’t picked up on–to me that indicates that it may very well be difficult to find an example showing the superiority of some type of residual analysis to goodness of fit measures (though it’s very easy to find examples where goodness of fit is good and the results are nonsensical).

Second, Pearl seems to be hard to understand (see comments above in this blog and http://www.mii.ucla.edu/causality/?p=633 for another example). Maybe it’s because graphs are hard to understand, but when you have to continually correct people’s impressions of what you are saying either they aren’t that bright, have reasons for not comprehending (“it’s impossible to make a man understand something when his living depends on not understanding it”) or you need to change your presentation. I won’t attempt to classify the various participants by this trichotomy but I will state that graphs are hard, having tried to fit data to ERGMs and not getting good results when there should be such.

+1

I characterize Pearl’s approach as elegant in principle but often hard to apply practically and especially generically to all problems. It sure has its utility for certain types of problems but Pearl seems to oversell it. To the point where he denigrates other approaches.

My view is that the proof is in the pudding: The moment I see Pearl’s approach being adopted by a wide variety of users in applied areas (that I care about) I’ll be convinced of it’s immense power. So far I see it as a niche use. With a cult following that sees this as the best thing since sliced bread.

They are hard to understand bc they are not taught. Most social scientist PhD take three semesters of stats and then often get basics wrong (so much for simplicity!).

I am convinced we can teach BA and MA and PhD how to do research in causal inference in one semester, and with a lower error rate. But we won’t know until somebody tries it out. Yet that often requires faculty approval, and then you get in a catch 22 (e.g. we don’t know if this works etc.. so lets wait and see)

Re “the proof is in the pudding” it is a highly conservative approach. You would still have been using an abacus long after the introduction of Arabic numerals, or not washing your hands to deliver babies long after the modern theory of disease. I don’t think cutting edge researchers should look in the rearview mirror.

If you’re convinced it can be taught, then do it! Nothing convinces people like a demonstration.

In my experience, most departments don’t provide a lot of oversight on syllabus content for anything other than core feeder classes (though stats does feel quite a bit more conservative to me than computer science did).

If you want a hand in college teaching and you’re not a professor, then you’re in the wrong job for controlling college curricula! Of course, if you’re outside the mainstream, getting such a job is harder. But you could always put together online teaching materials.

Bob:

I am doing it, just not in academia. The attitude in academia is not always conducive to innovation.

For example, during an academic conference I once asked a prominent young researcher who’d done some work on causal diagrams whether he’d be interested in putting together a panel etc. The response was that he did not want to be identified with causal diagrams bc many people did not like them, including Don Rubin (I’m not blaming Don for this as I don’t know his attitude). I still think my interlocutor is a great researcher but I found the attitude to be totally pathetic.

Meanwhile, at another conference a senior researcher said he simply did not like causal diagrams, and that was that. I could have discovered the cure for cancer. No matter. The message was the medium.

For better or worse in the private sector money talks above the tribalism. The price is you sometimes have to sell soap.

PS Our experience re syllabuses are very different. I was once asked to update the syllabus. My proposal had to be revised by 4 professors in what turned played out to be a cultural clash. In the end we reverted to the status quo ante with some minor change at the margins.

PPS I disagree one has to be inside academia to change things. Often change comes from the outside. Moreover, since many educational institutions are publicly funded (through subsidies, grants, or tax exemptions) I feel entitled as a tax payer to criticize their work.

@Fernando:

Do you have an published applied work that relies on Causal Diagrams? Can you provide a link? I’d love to browse.

Rahul:

I am no longer in academia so I am not really submitting papers anymore but you can see my manuscripts on attrition or generalized causal inference on my ssrn page https://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=1159950

PPPS And regarding my last PPS, I thank AG for providing this platform and being such a good moderator.

Rahul, Bob, Fernando, et al.

I am paraphrasing the conversation on education.

Rahul says: The proof is in the pudding. I am waiting for a wide variety of users, in my areas.

Fernando: This is too conservative. Causal diagrams seem hard b/c they are not being taught.

Bob:Why dont you teach it?

Fernando: I tried, the resistance is cultish.

Here are my thoughts about teaching causality.

Last year, the American Statistical Association announced

an annual $10,000 “Causality in Statistics Education Prize”,

“to encourage the teaching of basic causal inference

methods in introductory statistics courses.”

In an interview with Ron Wasserstein on Amstat, I said:

“.. the prize is aiming to close a growing gap

between research and education in this

field. While researchers are swept in an unprecedented excitement

over new causal inference tools that are unveiled before us almost

daily, the excitement is hardly seen among statistics educators,

and is totally absent from statistics textbooks.”

“I believe the availability of such educational material

will convince every statistics

instructor that causation is easy (It is!) and that he/she

too can teach it for fun and profit.

The fun comes from showing students how simple

mathematical tools can answer questions that

Pearson-Fisher-Neyman could not begin to address

(e.g., control of confounding, model diagnosis, Simpson’s

paradox, mediation analysis), and the profit comes because

most customers of statistics ask causal, not associational,

questions.”

http://magazine.amstat.org/blog/2012/11/01/pearl/

I am citing this interview because I firmly believe that today, the

major impediment to the understanding of causality in statistics is the lack of educational materials at the introductory level.

So, fellow bloggers, here is a proposition for those

who are interested in teaching causal inference .

I am currently co-writing an introductory, undergraduate text on

the topic, and we need substantive feedback from educators,

to tell us where students may get stuck, or where

we can make it easier for instructors to present in class.

If you are interested in providing such

feedback, please write to me personally judea@cs.ucla.edu,

and I will send you the chapters as they become available.

Additionally, referring to Fernando’s

remark on the fears he encountered among colleagues,

I promise to keep your identity secret, so that

arrow-phobic vice-squads will not associate your

name with this heretical project.

Correction (I should have made this clearer):

The snafu with the syllabus was not about causal diagrams but about game theory and rational choice (different course). I just wanted to let Bob know that my experience re syllabus was very different from his. But it is not connected to other points re causal diagrams I made above. (Though I suspect that would not have gone down too well either…)

numeric:

Are you speaking about Exponential Random Graph Models or does ERGM mean something different in this context? ERGMs are quite widespread now in Social Network Analysis and I have not heard of particular problems to get “good results”. And while graphical models – at least SEM – consider the edges to be fixed and the nodes to be random variables, it’s the opposite for Social Network Analysis including ERGMs as far as I know. I don’t know Judea Pearl’s approach very well, so I might be missing something.

Are you speaking about Exponential Random Graph Models? Yes

ERGMs are quite widespread now in Social Network Analysis. This was a biological application (connections between CRM (cis regulatory modules)), so the “nodes” (the CRM) were fixed and the connections random. The results may have very well have been bad because there simply weren’t any to be found but I found it very difficult to figure out what was going on–the usual intuition most statisticians have about a problem was missing and I suspect I wouldn’t be alone in that lack. So my point was graphs were hard for statisticians, and Pearl’s are even more difficult. As far as SEM’s and modelling observational data, in biology there is so much experimental variation that this seems unlikely to work (it would be marvelous if it did). Biologist like to distinguish between biological variation and technical variation, when you split the petri dish into two and run the same analysis on both, that technical variation. When you prepare two petri dishes with the same protocol, then run the analyses, that’s biological.

This gets to a larger point discussed in these various exchanges. Causal modelling should work better in biology, as the underlying processes are understood to a much better extent then social processes (I can give a reasonably coherent explanation as to why certain genes are expressed in certain cell states but explaining the underlying routes of racism (aside from a hand-waving appeal to sociobiological principles) is beyond me). Yet it’s not used (network analysis is used, but this drops the inference of causality).

Network analysis is indeed very widely used in areas of biology like gene expression, proteomics etc. What I’m still unsure about is: (a) How exactly would one add Pearl’s causality ideas into this framework & (b) how much extra value it’d add to what’s already done

Adding “causality” (a SEM approach, that is) would be tremendously helpful if it can be done. The expression (the first step in the creating of proteins) of certain genes are thought to influence the expression of other genes, so if one could statistically determine what gene triggered which, then one would have a powerful tool. As an example, cancer patients genomes typically have hundreds of genes expressing differently than in a normal cell, but it is almost certain that all of these genes simultaneously express at different levels. The holy grail is to find one gene that goes haywire and then develop treatments to prevent that from happening (needless to say, it is probably rarely one gene, but some small subset of genes). Ideally one wouldn’t need to do causal modelling–one could culture a cell line and find the genes which are going cancerous and observe how their expression patterns change over time. When I ask whether this is possible to real biologists, they just laugh at me. So there is room in the foreseeable future for statistical analysis.

I should mention this because Andrew’s blog is weighted towards subjects he is familiar with–statistics and political science (and every now and then a literary critique of dubitable quality). Yet it is biology where there is a pressing need for quality statistical investigation–and it is infuriating as to how little statistics seems to add to our understanding of biological processes (the network analyses referred to by Rahul above are dismissed by practicing biologists as “blobs”, because one typically gets the gene set partitioned into several large clumps, on which most of the nodes are connected). I encourage statisticians to try out their theories on the many datasets available on-line.

“As an example, cancer patients genomes typically have hundreds of genes expressing differently than in a normal cell, but it is almost certain that all of these genes simultaneously express at different levels. The holy grail is to find one gene that goes haywire and then develop treatments to prevent that from happening (needless to say, it is probably rarely one gene, but some small subset of genes). Ideally one wouldn’t need to do causal modelling–one could culture a cell line and find the genes which are going cancerous and observe how their expression patterns change over time.”

This is theory. There is another one based on that “every” (at least 95%, within 100% when taking into account errors in identifying cancer cells) cancer cell is aneuploid (has extra/missing chromosomes), and so has many misexpressed genes. It is interesting to look into the history of that.

My rough understanding of Pearl:

His argument is that statistics doesn’t have a language for expressing causal concepts (instead dealing with causation implicitly), and that he has developed such a language (representational framework, inference rules, etc.), and that this framework has clear advantages over standard statistical approaches in modelling potential interventions in various domains.

I don’t know enough of the technical details to take sides on this, but Pearl hasn’t set out many convincing examples of where statistical analyses have actually gone wrong as a result of not having an explicit language for expressing causal (rather than associational) concepts.

I vaguely remember reading a paper of Greenland’s who argued that the “healthy worker effect” (in epidemiology) may be a bias stemming from a lack of formal causal concepts. But really, if formal causal language is that important, then surely we’d have many more examples of systematic errors to point to?

I would argue that the entire application of statistical practice in research is fundamentally broken. This is certainly the case in biology, genetics, genomics, medicine, epidemiology (which I’m familiar with) and probably other fields as well.

Would a more explicit representation of assumed causal relations in a model help? The problem is that your question is in itself a causal question involving an unobservable counterfactual so it’s easy to argue endlessly about this because by its nature there’s no conclusive answer. Causal graphs may not solve everything, but I would say the current state and poor quality of applied statistical practice speaks for itself that what we’re doing now is unacceptable and arguably immoral.

“Immoral”? Really?! Strong words!

And while specific areas may have their problems aren’t you painting with a too wide a brush? “Applied statistical practice” in the hard sciences seems quite robust really. Sure they have their share of problems but which field doesn’t?

Applied statistical practice in hard sciences (e.g. physics) does better in part bc there are no ethical issues re experimentation so they do a lot more experiments that social science and medicine.

As someone who works in these fields, yes, really. No hyperbole – people do live and die by conclusions drawn from statistical inference, very directly so in the fields I mentioned. The vast majority of practitioners in those fields have a weak grasp of the relationship between statistical procedures they’re applying and the things they’re studying. Anil Potti is one case and point (among many).

anonymous – i get that these problems are important, but the rhetoric is just unconvincing.

Surely if the whole domain of statistical design / inference has been so fundamentally wrong for so many years, it would be easy to point to a few examples where:

1) they had clearly erred

2) Pearl’s framework would have avoided such errors

Pearl is fond of posing challenges, so surely it’s fair to pose one to him and his fellow travelers. And please, not toy problems or toy problems which are so widely cited in their favor – but rather, real world examples. After all, he’s concerned about modelling the real world, right?

To Interested Builder,

You say: “the rhetoric is just unconvincing.”

The rhetoric of those who advocated Arabic-Hindu numerals was

many times more unconvincing. It took in fact 800 years

for this notational system to take hold in Europe, and how?

not through the scholars, but through practitioners, merchants

and peddlers that got sick and tired of the old system. It was

rejected vehemently by those who judged it without trying it.

So, I am asking you a personal question: Have you tried to read DAGs?

Never mind causal inference, have you tried DAGs to answer some

of the most rudimentary questions about regression? for example,

“when would adding a regressor to an equation change a given

coefficient in that equation”? Those who never tried DAGs keep

on saying: “Unconvincing!”, but those who try, never go back to

the dark days of DAG-less regression analysis. Try it.

You further write:

Surely if the whole domain of statistical design / inference has been so fundamentally wrong for so many years, it would be easy to point to a few examples where:

1) they had clearly erred

2) Pearl’s framework would have avoided such errors

I will be glad to show you several such examples. But first note that

“the domain of statistical design / inference has NOT been fundamentally wrong for so many years”. Instead of being wrong, statistics simple avoided certain questions as “not well defined” (See Lindley and Novick, 1975) and left them outside the province of

statistics proper. (Something I call “faceless”).

As to mistakes, plenty; please read Don Rubin’s account of how Fisher went wrong on mediation (Rubin’s Fisher Lecture, 2005) and, ironically, my account on how Rubin went wrong on the same problem – mediation (Pearl, 2012), and how causal mediation has triumphed (Wikipedia, mediation(statistics).

You write:

“And please, not toy problems or toy problems which are so widely cited in their favor – but rather, real world examples.”

I take issue with this statement. Toy problems are conveyers of

understanding. If you cannot convey an idea with a three variable

example, you will not be able to covey this idea with 47 variables, in noisy data, and pages upon pages on how you labored to improve

your estimator. Aspirin – flue – fever are not less “real” then educational equality and inequality in 48 states. On the contrary,

I have seen many ideas misrepresented under the guise of “real life

problems” — none in toy problems. One cannot hide bad ideas in

the clarity of toy problems.

Still, the blunder of Fisher cannot be labeled “toy”,

that is why I brought it up as an example.

@interested bodybuilder

The “toy problem” critique in a way goes to the crux of this discussion.

It reminds me of the email war between Linus Torvalds & Tannenbaum over microkernels. There’s a difference between demonstrating by sheer arguments that a technique is overwhelmingly the best versus actually implementing it, & demonstrating it in the wild on actual use cases that matter to people.

In a lot of ways this current discussion has a flavor reminiscent of this old debate.

Rahul,

Your preference for demonstrating superiority “in the wild on

actual use cases that matter to people” is based on the premise

that one has a ground truth that will surface in the wild better

than on toy problems. This is not the case in most causal inference problems.

Take the task of assessing educational inequality in 48 states.

We cannot run randomized experiments to verify the validity of the

conclusions. The model is made up of dozens of “conditional ignorability” assumptions among dozens of variables, some of them

are plausible and some are not, and no one can argue with the

modeller, because conditional ignorability assumptions are cognitively formidable. If the superiority of DAG-based methodologies stems from permitting researchers to articulate

more plausible assumptions than their DAG-less counterparts,

this superiority will never show up in the data or in performance,

it would remain a matter of dispute between the two camps.

The beauty of toy examples is that they bring with them ground

truth. We know what to expect from flue-aspirins and fever. So,

if one’s favorite methodology does not deliver the expected on the flue-aspirin-fever example, we can be fairly sure that the method will not deliver valid results in the educational equality problem.

Andrew’s insistence on avoiding the flue-aspirin-fever example,

citing his disinterest in aspirin-like problems, misses the point that we need this example for testing the validity of his methodology, not to measure the effect of aspirin.

I think re-quoting Box will be illuminating at this point (slightly revised):

“The great advantage of toy problems over the “wild world” approach, it seems to me, is that at any given time we know what we are doing.”

Delivering the expected result on the toy case might be a necessary condition but it is in no way a sufficient condition. No doubt toy problems are great as pedagogy and even as method validation perhaps.

Let’s grant that your

flu-aspirin-feverexample does indeed deliver valid results for the toy case. That’s awesome! But how does this in any way imply that it *must* be a viable or attractive methodology for large, real, messy, practical problems with incomplete noisy data?Maybe it is, maybe it isn’t. But I’ll only know I see it in use. This cannot be proved by an armchair thought experiment!

The fundamental point is that no one inherently cares about the performance on toy cases, except as a means to a larger goal. The truth indeed may *not* surface any easier in the wild. But that’s the

*only*“truth” that matters!How do you get to the “the *only* truth that matters” if your method can’t pass the toy test.

CK:

Rahul wrote:

Rahul is arguing that solving toy-problems is not enough. And I agree. If a method can’t prove itself to be applicable in real research problems, solving toy problems alone will not make it useful for researchers. Hence, it’s sensible to wait for such examples before you jump onto the bandwagon.

I also agree with Rahul. I’ve recently read Pearl’s book (Causality) from cover to cover and found it to be very well written and wonderfully informative. And, in the context of the assumed model, the math is impeccable. I found much to agree with in this book (but also a few places that I very much disagreed with that I won’t go into now, mostly having to do with inference from randomized trials… and I don’t have a copy of the book in front of me). The main problem with this approach is one of practicality, and it’s not a small problem. The basic idea of the DAG approach (and actually of the non-randomized causal approach in general, as Pearl shows) is to establish conditional independencies within the model (i.e., to establish that a causal effect exists, it’s more important to consider the arrows that are excluded from the model than those that were included). However, all of the examples presented in Pearl’s book explicitly assumed that we were dealing with known probabilities, not empirical correlations. I contend that it’s not possible from empirical data to establish that two variables are independent. Given lots and lots of data, one can certainly get fairly precise estimates on the degree of dependency. But dependency cannot be ruled out no matter the sample size, and it’s unclear what the magnitude of dependence would need to be in order to cast doubts on the causal conclusion.

I repeat that not one of the examples (toy or otherwise) in Pearl’s book demonstrated how one would account for the uncertainty in empirical data.

Daniel:

No one is saying that solving toying problems is sufficient. But if it is necessary (as Rahul seems to agree)then it should be part of the problem solving. Don’t you think so?

Mark:

I think the “6 men and the elephant” analogy is helpful. I’ve devoted my career to modeling reality as I see it (sorry, Judea!), with a heavy focus on variation and a much lighter attention to causal structure. Judea has focused on causal structure without much interest in the particularities of variation. At some point it might be possible to combine our perspectives (I’m sure people are working on it!) but in the meantime I think it’s helpful that each of us is making progress, of a sort. I think Pearl is naive to think that his current approach can solve all problems—like you, I think there’s a real limit to how far he can get without trying to model variation—but I respect that he’s trying. Sometimes that’s how research progress is made, and certainly a lot of people have found his approach to be helpful.

I do think you will see it being used more and more in _actual_ applications (with unexpected challenges).

I have seen a couple in my current field – so far they have either lacked credibility or did not give enough details to determine if it was credible.

Here is some recent discussion that suggests “Advanced analytical methods and tools such as directed-acyclic graphs (DAGs) and Bayesian statistical techniques, can provide important insights and support interpretation of [should be used more routinely] as part of an epidemiology”

analysishttp://cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=516895

And Andrew’s “I’m sure people are working on it!” http://www.ncbi.nlm.nih.gov/pubmed/19363102

Mark:

I think your point is spot on; there is no easy way to determine to what extent two variables are independent from one another, taking into account the presence of other variables. This is actually the main problem I have had when applying them to biology. After searching frantically through the literature, I think I think there are a couple of compelling ways that do a “good enough” job of inferring causal inference in some nice settings.

Generally the question is: how sure are we that these two variables are actually independent. Mutual information is one approach, (discussed in this blog, see [1] and links therein). Other approaches assess whether two variables are significantly correlated with each other, and if a lack of significant fit, up to a given p-value, is found, causal independence is inferred. While this is not true in general, in practice this can provide a decent estimate for a conservative (i.e.: large) p-value. Partial correlation is (from what I have read at least) a popular way to assess independence for linear models, although unless the data is normally distributed (in our case it is not) it is not possible to infer causal relationships [2]. From a lay perspective, there is actually a nice stack-overflow post about this problem as well [3].

Potentially a larger problem is that its not clear how complex systems with feedback loops can be represented as a DAG. In particular (this is the problem I have been working on), in an evolutionary framework, many body features like brain size, body size, social grouping, likely evolved together. It in some environmental contexts, larger brains lead to larger bodies, but with the same species, in other contexts, larger bodies may lead to larger brains. This type of relationship seems hard to capture on a DAG.

In contrast, causal inference seems to work better in simpler settings where the directedness of relationships is a bit clearer, like the “disease, symptom, and aspirin” question above. Which is fine, except that I think a lot of interesting systems (particularly in the biological and social sciences) are characterized by complex feedback relationships. These systems seem really hard to capture as a DAGs.

(Although maybe I am missing something, if so let me know; I would love to stop tearing my hair out over it!)

[1] https://andrewgelman.com/2014/05/07/nonparametric-measures-mutual-information/

[2] http://dx.doi.org/10.1111/j.1467-842X.2004.00360.x

[3] http://stats.stackexchange.com/questions/73646/how-do-i-test-that-two-continuous-variables-are-independent

Are you suggesting that causality can be inferred without prior causal knowledge by simply computing p-values and partial correlations?

With regard to the usefulness of DAGs on feedback loops you can read the work of James Robins on time dependent covariates.

“Are you suggesting that causality can be inferred without prior causal knowledge by simply computing p-values and partial correlations?”

I would not say anything nearly that strong; I would say that searching for conditional independence (which seems critical for building causal graphs) may be possible to assess in some nice cases, to some degree of confidence, using things like mutual information criteria, or other statistical tests. It seems like there a couple of approaches that might work okay, in an application dependent fashion, but hopefully the number of quantifiers above, belies my own uncertainty about how easy it is to do this type of inference.

Whether or not these tests (or when combined with DAG learning algorithms, the inferred causal graph) actually says anything about the true underlying causal structure of the system is hard to say, and likely greatly depends on how well the model used to test for conditional independence fits reality, and if any variables or covariates are missing from the graph (or at least that has been the case in my experience).

In my mind, the place where these techniques work the best is when looking for large causal relationships in large data sets. The question there is not so much: “what is the definitive causal graph that represents reality”, but “what are the large factors that cause this thing to happen”. This comes up from time to time in the medical setting;, e.g. “I just ran a massive study of 10,000 cancer patients and collected 100 measures from each one. They all correlate horribly with each other. I have a 1 million dollars of research funding to throw at this problem, what are the biggest underlying factors for us to focus our efforts on, taking into account interactions between factors?”

“With regard to the usefulness of DAGs on feedback loops you can read the work of James Robins on time dependent covariates.”

Thanks! I will take a look at it.

@Mark

Causality is _always_ established by assumption. In this respect DAGs make such assumptions clear (and much more).

Even in a perfect randomized controlled trial you cannot rule out the possibility that the observed p-values, estimates, or CIs reflect chance alone, or that a non effect is the result of two countervailing effects, and so on. Bayesians appear more transparent in this regard but this is compatible with DAGs (cf. posterior edge marginals); although people often forget to write that Bayes theorem is conditional on assumptions too e.g. P(\theta|Y;A) and so on.

Moreover, it is useful to at times pretend we are God and know the causal structure. This mode of thinking helps us (1) _Define_ (and explain) the structural nature of the problem; (2) derive what observable manifestations all problematic models have in common; and (3) use such observable predictions to infer from associations alone whether the underlying (unknown) structure is problematic or not in practical applications (where we can no longer pretend to be God). (Alternatively we can assign (posterior) probabilities to each state of the world.) So here is one case where pretending you know stuff can be useful later on when you don’t want to pretend you know anything. The problem is that probability is not a good language for the situation where we pretend we are God. Indeed, God may or may not play dice, but I am pretty sure He does not speak in probabilities.

Is this method 100% foolproof? No, but no scientific method I know of is. The issue is whether it is useful. In my experience it is. It has helped me understand things like attrition or generalization much better than using ambiguous conditional probabilities (which, per ambiguity, don’t help define or explain), or unwieldy potential outcomes.

PS This is what physicists do. They imagine a world they cannot observe directly, mathematize it, derive testable implications, and perform these test to update their world view. I think DAGs make such theorizing very accessible, and more. This is not to say people who don’t use dags don’t theorize. AG’s pharmacokinetics application,to pick one example, has a lot of theory in it (in one version it even has a “structural” diagram which I found very useful!) but dags offer a half-way house btw implicit theorizing common in many empirical articles, or highly mathematical theorizing common in economics.

July 7

Rahul and other discussants of Toys vs. Guns.

There have been several discussions today on the

usefulness of toy models and on whether toy tests are

necessary and/or sufficient when the ultimate test

is success in the wild world, with its large, real,

messy, practical problems with incomplete noisy data”.

I think the working assumptions

in many of the comments were unrealistic.

Surely, passing the toy test is insufficient, but if this

is the ONLY test available before critical

decisions are to be made, then speaking about

insufficiency instead of conducting the test

is surealistic if not dangerous.

What I tried to explain before, and will try again,

is that, when it comes to causal inference, there is no

such thing as testing a method in the “wild world”, because

we cannot get any feedback from this world.

nor any indication of its success or failure, save

for the “roar of the crowd” and future disputes on

whether success or failure were due to the

method implemented or due to some other factors.

Under such circumstances, I dont understand what alternative

we have except for testing candidate methods in the laboratory, namely on toy models. And I dont understand the logic

of refraining from toy testing until enough people

use one method or another on “real life problems.”

It is like shooting untested guns into highly populated

areas, in foggy weather, while waiting for wisdom to

come from the gun manufacturer.

And I would also be very weary of “alternative methods” whose

authors decline to submit them to

the scrutiny of laboratory tests. In fact, what

“alternative methods” do we have, if their authors

decline to divulge their names?

The second working assumption that I find to be

mistaken is that DAG-based methods are not

used in practice, but are waiting passively for DAG-averse

practitioners to try them out. Anyone who reads the

literature in applied health science knows that

DAGs have become second language in epidemiology, biostatistics

and the enlightened social sciences. DAG-averse

practitioners should therefore ask themselves whether they

are not missing precious opportunities by waiting for their

peers to make the first move.

First, it is an opportunity to catch up with the wave of the future and, second, it is an opportunity to be guided by models of reality so as to “know at any give time what they are doing” (G.Box).

No, DAG-based methods are not a panacea, and you would

not know if your method is successful even if you see

it in use. But one thing you would know, that at any

given time you acted in accordance with your knowledge

of the world and in concordance with the logic of that knowledge.

On a positive note, I liked this study:

“Using Directed Acyclic Graphs for Investigating Causal Paths for Cardiovascular Disease”

http://omicsonline.org/using-directed-acyclic-graphs-for-investigating-causal-paths-for-cardiovascular-disease-2155-6180.1000182.php?aid=20947

To me, it seems we need more work of this kind & with even larger “real” datasets & exploring more complex relationships.

I’d love to see less papers explaining DAGs & exhorting us to use them etc. & more papers that _actually use_ DAGs on real, big, complicated & important problems.

Andrew W., for working with feedback loops instead of DAGs, check out the field of system dynamics (SD). John Sterman’s somewhat pricey /Business Dynamics/ is one of the current canonical texts (hint: you might want to check it out at a library first), and there are many other good ones. That field stresses one of the things I think I hear Judea Pearl suggesting: creating causal models that match our hypotheses of the problem we are addressing both in structure and in behavior. Such a model has the potential to predict behavior even in new, unforeseen situations.

You can implement SD models in various commercial simulators (Vensim, iThink, …), you can implement them with MCSim or deSolve, or you can code your own. The MCSim quick reference card at https://www.gnu.org/software/mcsim/ gives a simple example of a SD model in MCSim. Andrew G.’s PBPK models with Frederic Bois have that sort of feedback structure, too.

MCSim seems very interesting.

Naive Question: Is there an overlap between MCSim & Stan or are they intended for totally different uses?

Once we add the solve_ode function to Stan’s language, we’ll be able to code up general MCSim models. We’ve been working with Frederic Bois (author of MCSim) on some applied modeling issues.

Thanks Bob! And what is key stuff that Stan can do that MCSim cannot?

Stan uses HMC/NUTS for MCMC instead of Metropolis, so Stan adds sampling efficiency.

Stan’s set of types is much more expressive. For example, MCSim doesn’t support multivariate hyperpriors, which is something we’ve been using for patient-specific “random effects” in PK/PD models. Stan gives you a wide range of special functions including linear algebra, densities, and (C)CDFs.

“That field stresses one of the things I think I hear Judea Pearl suggesting: creating causal models that match our hypotheses of the problem we are addressing both in structure and in behavior. “

It’s sad that the research world really gotten to the point where this needs to be a “suggestion”. Unfortunately, there are some fields (large swaths of medicine and genomics for example) where people have no idea how to go about this.

sorry, should be:

“toy problems or toy models”

I actually think economics is a field that has focused on the issue of causality and data-generating processes more than most fields. Structural estimation (not SEMs), is practiced by several people in the UCLA economics department (I mention this because J. Pearl is at UCLA); Rosa Matzkin is foremost among the econometricians there studying structural methods. Structural estimation explicitly models the data-generating process based on “theories of reality” (for lack of a better term).

The other side of causal estimation in economics, “reduced-form” methods, focus on the potential-outcomes framework. Perhaps the biggest issue for the latter is not identifying *a* causal effect, but understanding *what* causal effect is actually being measured. Theory would help elucidate this.

Many theoretical models in economics seem analogous to the drawings I see in stats papers, but perhaps formalized more coherently when done correctly.

Open to different/better interpretations of all of the above, however.

James,

Your comments on econometrics are extremely

pertinent to this discussion. Econometrics is indeed one of the

two remaining islands of researchers who could benefit

most from modern tools of causal analysis, because

their models are, as you put it, models of the

data-generating process, based on “theories of reality”.

Unfortunately, they are still unaware of this opportunity,

and I discuss the reasons for this estrangement in these papers:

http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf

http://ftp.cs.ucla.edu/pub/stat_ser/r395.pdf

As you can see from the first, I attribute the

estrangement to a profound ignorance of how

to read DAGs. Even Jim Heckman who recently

accepted the structural reading of counterfactuals

(with a minor modification) could not bring himself

to accept the second principle of causal inference:

reading independencies from the model structure,

(also known as “d-separation”).

To remedy the situation, Chen and I wrote a

survey paper

http://ftp.cs.ucla.edu/pub/stat_ser/r428.pdf

which demonstrates the benefits of graphical

tools in the context of linear models, where

most economists feel secure and comfortable.

Still, your point about further collaboration with

econometricians is well taken, and I will forward

your post to Rosa Matzkin as a proof that

we should renew our discussions of the past.

One difference between economists and

causal analysts lies in the notion of “identification”

which to economists means identifying the form

of the equations and to causal analysts

means identifying questions of interest (eg

causal effects), leaving the equations undetermined.

On the positive side, I should note that

the paper on Haavelmo’s legacy:

http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf

is the first that I was invited to

write for a mainstream econometrics journal.

Only time will tell whether this will help bridge

our enigmatic wall of estrangement.

At any rate, the editor, Olav Bjerkholt, deserves

a medal of courage for his heroic attempt to

create a dialogue between two civilizations.

I never thought “reading DAGs” was controversial. It’s just a way of writing a model down that should be familiar to anyone who’s used BUGS or JAGS. And it then lets you do useful computations graphically like d-separation or finding Markov blankets.

If you want to understand issues like d-separation, Thomas Richardson has some nice overviews, such as this recent talk:

http://www.stat.washington.edu/tsr/talks/high-dim-causal-learning.pdf

and this tech report with James Robins:

http://www.csss.washington.edu/Papers/wp128.pdf

P.S. Stan’s language just defines a log posterior — you can do that with a directed acyclic graphical model, but there’s no special status or inference provided for such representations.

[…] The entire discussion can be accessed here https://andrewgelman.com/2014/07/03/great-advantage-model-based-ad-hoc-ap-proach-seems-given-time-kno… […]

[…] quem interesse em discussões sobre estatística e causalidade, vale a pena ler estes dois posts (aqui e aqui) do Andrew Gelman, principalmente as discussões ocorridas nos comentários, com […]