The happiness gene: My bottom line (for now)

I had a couple of email exchanges with Jan-Emmanuel De Neve and James Fowler, two of the authors of the article on the gene that is associated with life satisfaction which we blogged the other day. (Bruno Frey, the third author of the article in question, is out of town according to his email.) Fowler also commented directly on the blog.

I won’t go through all the details, but now I have a better sense of what’s going on. (Thanks, Jan and James!) Here’s my current understanding:

1. The original manuscript was divided into two parts: an article by De Neve alone published in the Journal of Human Genetics, and an article by De Neve, Fowler, Frey, and Nicholas Christakis submitted to Econometrica. The latter paper repeats the analysis from the Adolescent Health survey and also replicates with data from the Framingham heart study (hence Christakis’s involvement).

The Framingham study measures a slightly different gene and uses a slightly life-satisfaction question compared to the Adolescent Health survey, but De Neve et al. argue that they’re close enough for the study to be considered a replication. I haven’t tried to evaluate this particular claim but it seems plausible enough. They find an association with p-value of exactly 0.05. That was close! (For some reason they don’t control for ethnicity in their Framingham analysis–maybe that would pull the p-value to 0.051 or something like that?)

2. Their gene is correlated with life satisfaction in their data and the correlation is statistically significant. The key to getting statistical significance is to treat life satisfaction as a continuous response rather than to pull out the highest category and call it a binary variable. I have no problem with their choice; in general I prefer to treat ordered survey responses as continuous rather than discarding information by combining categories.

3. But given their choice of a continuous measure, I think it would be better for the researchers to stick with it and present results as points on the 1-5 scale. From their main regression analysis on the Adolescent Health data, they estimate the effect of having two (compared to zero) “good” alleles as 0.12 (+/- 0.05) on a 1-5 scale. That’s what I think they should report, rather than trying to use simulation to wrestle this into a claim about the probability of describing oneself as “very satisfied.”

They claim that having the two alleles increases the probability of describing oneself as “very satisfied” by 17%. That’s not 17 percentage points, it’s 17%, thus increasing the probability from 41% to 1.17*41% = 48%. This isn’t quite the 46% that’s in the data but I suppose the extra 2% comes from the regression adjustment. Still, I don’t see this as so helpful. I think they’d be better off simply describing the estimated improvement as 0.1 on a 1-5 scale. If you really really want to describe the result for a particular category, I prefer percentage points rather than percentages.

4. Another advantage as describing the result as 0.1 on a 1-5 scale is that it is more consistent with intuitive notions of 1% of variance explained. It’s good they have this 1% in their article–I should present such R-squared summaries in my own work, to give a perspective on the sizes of the effects that I find.

5. I suspect the estimated effect of 0.1 is an overestimate. I say this for the usual reason, discussed often on this blog, that statistically significant findings, by their very nature, tend to be overestimates. I’ve sometimes called this the statistical significance filter, although “hurdle” might be a more appropriate term.

6. Along with the 17% number comes a claim that having one allele gives an 8% increase. 8% is half of 17% (subject to rounding) and, indeed, their estimate for the one-allele case comes from their fitted linear model. That’s fine–but the data aren’t really informative about the one-allele case! I mean, sure, the data are perfectly consistent with the linear model, but the nature of leverage is such that you really don’t get a good estimate on the curvature of the dose-response function. (See my 2000 Biostatistics paper for a general review of this point.) The one-allele estimate is entirely model-based. It’s fine, but I’d much prefer simply giving the two-allele estimate and then saying that the data are consistent with a linear model, rather than presenting the one-allele estimate as a separate number.

7. The news reports were indeed horribly exaggerated. No fault of the authors but still something to worry about. The Independent’s article was titled, “Discovered: the genetic secret of a happy life,” and the Telegraph’s was not much better: “A “happiness gene” which has a strong influence on how satisfied people are with their lives, has been discovered.” An effect of 0.1 on a 1-5 scale: an influence, sure, but a “strong” influence?

8. There was some confusion with conditional probabilities that made its way into the reports as well. From the Telegraph:

The results showed that a much higher proportion of those with the efficient (long-long) version of the gene were either very satisfied (35 per cent) or satisfied (34 per cent) with their life – compared to 19 per cent in both categories for those with the less efficient (short-short) form.

After looking at the articles carefully and having an email exchange with De Neve, I can assure you that the above quote is indeed wrong, which is really too bad because it was an attempted correction of an earlier mistake. The correct numbers are not 35, 34, 19, 19. Rather, they are 41, 46, 37, 44. A much less dramatic difference: changes of 4% and 2% rather than 18% and 15%. The Telegraph reporter was giving P(gene|happiness) rather than P(happiness|gene). What seems to have happened is that he misread Figure 2 in the Human Genetics paper. He then may have got stuck on the wrong track by expecting to see a difference of 17%.

9. The abstract for the Human Genetics paper reports a p-value of 0.01. But the baseline model (Model 1 in Table V of the Econometrica paper) reports a p-value of 0.02. The lower p-values are obtained by models that control for a big pile of intermediate outcomes.

10. In section 3 of the Econometrica paper, they compare identical to fraternal twins (from the Adolescent Health survey, it appears) and estimate that 33% of the variation in reported life satisfaction is explained by genes. As they say, this is roughly consistent with estimates of 50% or so from the literature. I bet their 33% has a big standard error, though: one clue is that the difference in correlations between identical and fraternal twins is barely statistically significant (at the 0.03 level, or, as they quaintly put it, 0.032). They also estimate 0% of the variation to be due to common environment, but again that 0% is gonna be a point estimate with a huge standard error.

I’m not saying that their twin analysis is wrong. To me the point of these estimates is to show that the Adolescent Health data are consistent with the literature on genes and happiness, thus supporting the decision to move on with the rest of their study. I don’t take their point estimates of 33% and 0% seriously but it’s good to know that the twin results go in the expected direction.

11. One thing that puzzles me is why De Neve et al. only studied one gene. I understand that this is the gene that they expected to relate to happiness and life satisfaction, but . . . given that it only explains 1% of the variation, there must be hundreds or thousands of genes involved. Why not look at lots and lots? At the very least, the distribution of estimates over a large sample of genes would give some sense of the variation that might be expected. I can’t see the point of looking at just one gene, unless cost is a concern. Are other gene variants already recorded for the Adolescent Health and Framingham participants?

12. My struggles (and the news reporters’ larger struggles) with the numbers in these articles makes me feel, even more strongly than before, the need for a suite of statistical methods for building from simple comparisons to more complicated regressions. (In case you’re reading this, Bob and Matt3, I’m talking about the network of models.)

As researchers, transparency should be our goal. This is sometimes hindered by scientific journals’ policies of brevity. You can end up having to remove lots of the details that make a result understandable.

13. De Neve concludes the Human Genetics article as follows:

There is no single ”happiness gene.’ Instead, there is likely to be a set of genes whose expression, in combination with environmental factors, influences subjective well-being.

I would go even further. Accepting their claim that between one-third and one-half of the variation in happiness and life satisfaction is determined by genes, and accepting their estimate that this one gene explains as much as 1% of the variation, and considering that this gene was their #1 candidate (or at least a top contender) for the “happiness gene” . . . my guess is that the set of genes that influence subjective well-being is a very large number indeed! The above disclaimer doesn’t seem disclaimery-enough to me, in that it seems to leave open the possibility that this “set of genes” might be just three or four. Hundreds or thousands seems more like it.

I’m reminded of the recent analysis that found that the simple approach of predicting child’s height using a regression model given parents’ average height performs much better than a method based on combining 54 genes.

14. Again, I’m not trying to present this as any sort of debunking, merely trying to fit these claims in with the rest of my understanding. I think it’s great when social scientists and public health researchers can work together on this sort of study. I’m sure that in a couple of decades we’ll have a much better understanding of genes and subjective well-being, but you have to start somewhere. This is a clean study that can be the basis for future research.

Hmmm . . . .could I publish this as a letter in the Journal of Human Genetics? Probably not, unfortunately.

P.S. You could do this all yourself! This and my earlier blog on the happiness gene study required no special knowledge of subject matter or statistics. All I did was tenaciously follow the numbers and pull and pull until I could see where all the claims were coming from. A statistics student, or even a journalist with a few spare hours, could do just as well. (Why I had a few spare hours to do this is another question. The higher procrastination, I call it.) I probably could’ve done better with some prior knowledge–I know next to nothing about genetics and not much about happiness surveys either–but I could get pretty far just tracking down the statistics (and, as noted, without any goal of debunking or any need to make a grand statement).

P.P.S. See comments for further background from De Neve and Fowler!

12 thoughts on “The happiness gene: My bottom line (for now)

  1. Slightly tongue in cheek, I might suggest this might be a gene that causes people to make pencil marks on the right (where the 5 is) rather than at the left (where the 1 is).

  2. Thank you! Very helpful and very kind of you to wrap up our conversation in your influential blog!

    You ask about why we didn't include any more genes in our analysis. Add Health has about 10 or so genotypes available in the data and we included them in the analysis in Table V in appendix of the larger paper. 5-HTTLPR continues to come in significantly, no other genotypes do so. And the R-squared increases slightly, indicating that we are indeed explaining a little more of the variance when including more genetic variation.

    We focussed in on 5-HTTLPR from the start because it's a well-studied genotype that had been linked (though inconclusively) to mental well-being before; hence, it became our "candidate" gene. It's crucial in behavior genetics to state one's ex ante hypothesis as the temptation is to check a great many genotypes and then report on any significant ones without bonferroni correction (for multiple testing).

    We have current projects under way, however, that take the integration of social sciences and genetics to a whole new level. Where we do genome-wide association studies (GWAS) of phenotypes (i.e. traits) such as educational attainment on almost a 100,000 individuals. These studies consider over half a million SNPs (points on our genome where humans may differ) and see if any make a significance threshold of p

  3. Forgot to add: the reason why no ethnicity variables are part of the Framingham replication models is because these folks are virtually all white (and of Italian origin!). That's actually a good thing in behavior genetics as it guards against population stratification driving a genetic association (beyond what any ethnicity control variables could prevent).

  4. Daniel:

    Sure, but given that they have a reasonable continuous measure, I don't see the point of going to the trouble.

  5. 1. We don't control for ethnicity in Framingham because they are all white!

    To control for population stratification we use the first 10 principle components of a singlular value decomposition of the person by gene matrix. There is some other nice work ( http://www.nature.com/nature/journal/v456/n7218/a… ) that shows you can reconstruct the map of Europe from the first two components, and we cite research in the paper suggesting that adding 10 components is sufficient to control for associations related to common ancestry.

    3-4. We'll consider changing the way we present the results — in part, this way of thinking about the presentation is a leftover from other papers where we had a dichotomous dependent variable. But we'll wait to see what our referees have to say, too.

    5. I wonder if the claims about the effect being an overestimate are mitigated by the fact that we replicate (though, as you note, just barely!).

    6. I agree we don't have enough power to test non-linearity in the alleles, so the one-allele result is model dependent. But even though it is redundant, I think it fixes in the reader's mind the approximate effect size we think we've found.

    7-8. Granted on news distortion — I've been iterating my approach with reporters the last five years to try and damp this stuff down as much as possible.

    10. You are right that the confidence intervals on the heritability estimates are wide. In other papers where the heritability estimate is the focus, we have used a nice ternary plot to characterize the uncertainty (see http://jhfowler.ucsd.edu/heritability_of_cooperat… or http://jhfowler.ucsd.edu/genetic_basis_of_politic… ).

    11. We studied SLC6A4 primarily because of previous research on the gene and optimism and other work that suggests the gene might be associated with happiness. If we instead just blindly do association studies with all six genes available in Add Health, then it's necessary to correct the results for multiple testing. This is a big issue in genetics — they have been so disappointed with lack of replication, that many refuse to consider an a priori hypothesis as grounds for studying a single gene. But in this case we have (at least) one replication. To see an example of how this looks when there is not an a priori hypothesis, take a look at our recent PNAS study on genetic homophily in social networks ( http://jhfowler.ucsd.edu/cooperative_behavior_cas… ).

    12. I agree it's important to present raw results. This is why we show summary statistics and baseline models….

    13. When I talk to reporters, and sometimes in published research, I say there are likely to be "hundreds of genes" involved in any one complex social behavior.

    Thanks again Andrew — I love your work and I am a frequent lurker on this blog!

  6. On points 11 and 13: Isn't it just as plausible that genes have almost no relationship with happiness? You are right that either there are hundreds of genes involved or else there is no significant effect even with many genes. The height example is a good one: We now know that diet (a non-genetic factor) has a huge impact on height. I guess my question really is: Are the twin studies really informative? It looks like only barely. And you might imagine that being an identical vs fraternal twin has non-trivial differentiated impact on your happiness, and maybe even ties it somewhat (positively or negatively) to the happiness of your sibling (thus complicating the studies).

  7. Jan, James: Thanks for the replies.

    David: I haven't looked at the twin studies in detail but there seems to be a lot of literature on the heritability of happiness, and it certainly is plausible to me. I agree that twin studies have issues but it's hard for me to imagine that genetics has no input on happiness. Regarding height, my point is that a huge amount of the variation is genetic (especially when considering a population of generally well-fed mothers and children), yet it's hard to find this, even in a set of 50 genes.

  8. Andrew: with p-value of exactly 0.05. That was close!
    James: we replicate (though, as you note, just barely!).

    p-values are far from ideal to investigate replication (at least Estimates and SEs) and as the whole blog post indicates – it's challenging.

    Andrew: (For some reason they don't control for ethnicity in their Framingham analysis–maybe that would pull the p-value to 0.051 or something like that?)

    Thats what Greenland called confirmatory bias, adjusting the analyses until it matches the original claim (which makes it easier to get published)

    K?

  9. Re 6: there's work showing that the additive component of the genetic 'signal' is the part that's best preserved, in all the noise that comes when we measure variants that are not the truly causal variant – a form of measurement error, if you like.

    This motivates the standard approach of fitting additive relationships, and there is a plausible power gain if we focus the analysis on the linear term, over something more general.

    This shouldn't stand in the way of reporting allele-specific results, of course – and there is no uniformly "best" way to do this analysis – but it's another argument to add to the interpretibility that James mentioned.

  10. Authors often bemoan the poor reporting of research results. What troubles me is that those same authors are actively promoting their research in the media. I've even seen this when the results haven't been through a review process. Maybe its overly dramatic, but the pursuit of attention often seems more important than the pursuit of truth.

Comments are closed.