At the City University of New York Graduate Center, 365 Fifth Avenue (between 34th and 35th street), room 6002. The topic: causality and statistical learning. Announcement is here (scroll down). It says that if you would like to attend any event, please respond by emailing datamining@gc.cuny.edu I’m also giving a shorter talk on the same [...]
My talk in Chicago this Thurs 6:30pm
Choices in Visualizing Data This time, it’s not at the university, it’s at a data science meetup. Here are the slides. I actually prefer the term “statistical graphics” or “visualizing quantitative information” rather than “visualizing data.” I spend a lot of time graphing inferences and fitted models, understanding my fits and doing exploratory model analysis. [...]
Detecting predictability in complex ecosystems
A couple people pointed me to a recent article, “Detecting Causality in Complex Ecosystems,” by fisheries researchers George Sugihara, Robert May, Hao Ye, Chih-hao Hsieh, Ethan Deyle, Michael Fogarty, and Stephan Munch. I don’t know anything about ecology research but I could imagine this method being useful in that field. I can’t see the approach [...]
Can you write a program to determine the causal order?
Mike Zyphur writes: Kaggle.com has launched a competition to determine what’s an effect and what’s a cause. They’ve got correlated variables, they’re deprived of context, and you’re asked to determine the causal order. $5,000 prizes. I followed the link and the example they gave didn’t make much sense to me (the two variables were temperature [...]
My talk at the University of Michigan today 4pm
Causality and Statistical Learning Andrew Gelman, Statistics and Political Science, Columbia University Wed 27 Mar, 4pm, Betty Ford Auditorium, Ford School of Public Policy Causal inference is central to the social and biomedical sciences. There are unresolved debates about the meaning of causality and the methods that should be used to measure it. As a [...]
Why big effects are more important than small effects
The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the [...]
A must-read paper on statistical analysis of experimental data
Russ Lyons points to an excellent article on statistical experimentation by Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, Ya Xu, a group of software engineers (I presume) at Microsoft. Kohavi et al. write: Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft . . . deployment and mining [...]
Fowlerpalooza!
Russ Lyons points us to a discussion in Statistics in Medicine of the famous claims by Christakis and Fowler on the contagion of obesity etc. James O’Malley and Christakis and Fowler present the positive case. Andrew Thomas and Tyler VanderWeele present constructive criticism. Christakis and Fowler reply. Coincidentally, a couple weeks ago an epidemiologist was [...]
That claim that students whose parents pay for more of college get worse grades
Theodore Vasiloudis writes: I came upon this article by Laura Hamilton, an assistant professor in the University of California at Merced, that claims that “The more money that parents provide for higher education, the lower the grades their children earn.” I can’t help but feel that there something wrong with the basis of the study [...]
They’d rather be rigorous than right
Following up on my post responding to his question about that controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy, Kyle Peyton writes: I’m happy to see you’ve articulated similar gripes I had w/ the piece, which makes me feel like I’m not crazy. I remember discussing this with [...]
The effects of fiscal consolidation
José Iparraguirre writes: I’ve read a recent paper by the International Monetary Fund on the effects of fiscal consolidation measures on income inequality (Fiscal Monitor October 2012, Appendix 1). They run a panel regression with 48 countries and 30 years (annual data) of a measure of income inequality (Gini coefficient) on a number of covariates, [...]
Understanding regression models and regression coefficients
David Hoaglin writes: After seeing it cited, I just read your paper in Technometrics. The home radon levels provide an interesting and instructive example. I [Hoaglin] have a different take on the difficulty of interpreting the estimated coefficient of the county-level basement proportion (gamma-sub-2) on page 434. An important part of the difficulty involves “other [...]
Statistical modeling, causal inference, and social science
Interesting discussion by Berk Ozler (which I found following links from Tyler Cowen) of a study by Erwin Bulte, Lei Pan, Joseph Hella, Gonne Beekman, and Salvatore di Falco that compares two agricultural experiments, one blinded and one unblinded. Bulte et al. find much different results in the two experiments and attribute the difference to [...]
New prize on causality in statstistics education
Judea Pearl writes: Can you post the announcement below on your blog? And, by all means, if you find heresy in my interview with Ron Wasserstein, feel free to criticize it with your readers. I responded that I’m not religious, so he’ll have to look for someone else if he’s looking for findings of heresy. [...]
‘Researcher Degrees of Freedom’
False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant [I]t is unacceptably easy to publish “statistically significant” evidence consistent with any hypothesis. The culprit is a construct we refer to as researcher degrees of freedom. In the course of collecting and analyzing data, researchers have many decisions to make: Should [...]
Statistical discrimination again
Mark Johnstone writes: I’ve recently been investigating a new European Court of Justice ruling on insurance calculations (on behalf of MoneySuperMarket) and I found something related to statistics that caught my attention. . . . The ruling (which comes into effect in December 2012) states that insurers in Europe can no longer provide different premiums [...]
Comparing people from two surveys, one of which is a simple random sample and one of which is not
Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the [...]
Using the “instrumental variables” or “potential outcomes” approach to clarify causal thinking
As I’ve written here many times, my experiences in social science and public health research have left me skeptical of statistical methods that hypothesize or try to detect zero relationships between observational data (see, for example, the discussion starting at the bottom of page 960 in my review of causal inference in the American Journal [...]
Update on Levitt paper on child car seats
A few years ago I noted the following quote from applied microeconomist Steven Levitt: Is it surprising that scientists would try to keep work that disagrees with their findings out of journals? When I told my father that I [Levitt] was sending my work saying car seats are not that effective to medical journals, he [...]
Multilevel modeling and instrumental variables
Terence Teo writes: I was wondering if multilevel models can be used as an alternative to 2SLS or IV models to deal with (i) endogeneity and (ii) selection problems. More concretely, I am trying to assess the impact of investment treaties on foreign investment. Aside from the fact that foreign investment is correlated over time, [...]
Examples of the use of hierarchical modeling to generalize to new settings
In a link to our back-and-forth on causal inference and the use of hierarchical models to bridge between different inferential settings, Elias Bareinboim (a computer scientist who is working with Judea Pearl) writes: In the past week, I have been engaged in a discussion with Andrew Gelman and his blog readers regarding causal inference, selection [...]
The treatment, the intermediate outcome, and the ultimate outcome: Leverage and the financial crisis
Gur Huberman points to an article on the financial crisis by Bethany McLean, who writes: lthough our understanding of what instigated the 2008 global financial crisis remains at best incomplete, there are a few widely agreed upon contributing factors. One of them is a 2004 rule change by the U.S. Securities and Exchange Commission that [...]
Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings
Elias Bareinboim asked what I thought about his comment on selection bias in which he referred to a paper by himself and Judea Pearl, “Controlling Selection Bias in Causal Inference.” I replied that I have no problem with what he wrote, but that from my perspective I find it easier to conceptualize such problems in [...]
More questions on the contagion of obesity, height, etc.
AT discusses [link broken; see P.P.S. below] a new paper of his that casts doubt on the robustness of the controversial Christakis and Fowler papers. AT writes that he ran some simulations of contagion on social networks and found that (a) in a simple model assuming the contagion of the sort hypothesized by Christakis and [...]
Is linear regression unethical in that it gives more weight to cases that are far from the average?
I received the following note from someone who’d like to remain anonymous: