Latest gay gene tabloid hype

The tabloid in question is the journal Nature, which along with Science and PPNAS (the Proceedings of the National Academy of Sciences, publisher of gems such as the himmicanes and hurricanes study) has in recent years become notorious for publishing flashy but unsubstantiated scientific claims.

As Lord Acton never said, publicity corrupts, and absolute publicity corrupts absolutely. We now have a vicious cycle where journalists rely on Nature/Science/PPNAS for science news that is both exciting and respectable (these are considered top journals, after all), where scientists send their best and most exciting work to these journals (we all want publicity for our best work), and where Nature/Science/PPNAS thrive on the publicity.

The usual story is that we notice flaws in published papers that had managed to get through the scientific review process.

Today’s story is slightly different in that it is a report of a conference presentation for which there is no publication, indeed not even a preprint that we could look at.

I learned about this one from statistician Thomas Lumley, who links to a news article by Sara Reardon in Nature with subheadline, “Twin study reveals five DNA markers that are associated with sexual orientation.” Reardon reports:

Researchers collected DNA samples in saliva from 37 pairs of identical twins in which only one twin was gay, and 10 pairs in which both were gay. By scanning the twins’ epigenomes, the researchers found five epi-marks that were more common among the gay men than in their genetically identical straight brothers. An algorithm they developed based on the five epi-marks could correctly predict the sexual orientation of men in the study 67% of the time. UCLA computational geneticist Tuck Ngun will present the work on 8 October at the American Society of Human Genetics meeting in Baltimore, Maryland.

There are some problems here. First, as Lumley points out:

70% accuracy doesn’t seem all that impressive. Using the usual figures on the proportion of men who are gay, the approach of assuming everyone is straight unless you are told otherwise is better than 90% accurate, and doesn’t need expensive genetics. Presumably they mean something different by 70% accuracy, but we don’t know what.

Indeed, it’s hard know exactly what they did. Here’s a guess. The study in question seems to have 57 gay men and 37 straight men. So in this case the optimal rule is to always guess gay, which would be correct 61% of the time. So a rule that is correct 67% of the time in this sample is not very impressive, especially given that they got to choose the rule based on their data!

And that brings me to my second concern, which is statistical. The report says that researchers found 5 epi-marks based on a sample of size 47. Seems like a serious selection problem here, maybe not so hard to find 5 things that fit your data even if everything were pure noise.

In her news article, Reardon does write:

Associations found in small studies are prone to evaporate when tested in larger groups.

It’s great to put this warning in, but this is one sentence buried 16 sentences deep within her article, and it doesn’t really do much to counteract the impression given by the generally positive and unquestioning tone of the article.

I was curious what other publicity this study has received so I did a search on Google News. 148 articles were listed, including in respected publications such as the Telegraph and Guardian of London, and the Los Angeles Times.

One of the better reports is by Virginia Hughes of BuzzFeed. The headline there is, “Epigenetic Test Can Predict Homosexuality, Controversial Study Claims.” Well put.

And, in the Atlantic, Ed Yong’s report is headlined, “No, Scientists Have Not Found the ‘Gay Gene’: The media is hyping a study that doesn’t do what it says it does.” Yong mentions the overfitting concern noted above and provides some additional details:

As far as could be judged from the unpublished results presented in the talk, the team used their training set to build several models for classifying their twins, and eventually chose the one with the greatest accuracy when applied to the testing set.

I don’t know how Yong got this information—perhaps someone leaked a preprint of the research paper to him? But, in any case, yeah, this selection problem is a big deal.

To put it another way, why should we believe these headlines? Because someone from a respected university gave a conference talk on it? That’s not enough: conference talks are full of speculative research efforts. Because it was featured in a news article in Nature? No.

As I see it, the problem is not with the research itself—I disagree with Yong’s statement that perhaps the best option with this research is “to not do it at all”—but with its presentation as truth, or even as provisional truth. Speculation is fine, just label it as such.

P.S. Reardon also refers unquestioningly to the claim that “the chance of a man being gay increases by 33% for each older brother he has,” but the link here is to a 2001 paper on the topic by Ray Blanchard, and last time I looked at this work, back in 2006, I had some concerns about this claim (also see comments on that post for some interesting discussion).

P.P.S. More here. Researcher Tuck Ngun defends his twin study and I remain skeptical.

15 thoughts on “Latest gay gene tabloid hype

  1. If PNAS / Nature / Science are crappy what are the good Journals?

    Why project from one crappy study to the whole Journal? Sure they’ve published other crap but what Journal hasn’t?

    Academic publishing as a whole has deep, serious problems. Focus on PNAS makes it seem a journal specific problem.

    • Rahul:

      I never said these journals are crappy, I just said they’ve become notorious for publishing flashy but unsubstantiated scientific claims.

      When it comes to social science, I do think these journals publish some crap that would never pass muster at the American Political Science Review, for example. One trouble is that they seem to specialize in short crisp papers that can be summarized in a snappy headline. But science doesn’t always work that way.

      I focus on the tabloids in part because their reputation is such that papers published there are often assumed by science journals to be true and correct. If journalists were to ignore PPNAS, I’d be happy to do so also.

      • When you write “they’ve become notorious for publishing flashy but unsubstantiated scientific claims” it sounded to me like you mean’t they are worse than most other journals. e.g. “Camden, NJ has become notorious for its high crime rate.”

        Which I don’t think is the case. They are no worse than they typical academic Journal.

        • I dread it when info I want appears to be published in Nature/Science and their ilk. Almost without fail it will be impossible to figure out what was actually done. Poor/abbreviated methods sections are not limited to those prestige journals, but definitely more common there. It also depends on the date, the absolute worst is Science from ~2000.

        • But, even if true, a “typical academic journal” is not attracting as much publicity. In addition, the more striking the result, the more probable that it is a fluke. I vaguely remember some graph making rounds on the internet a few years ago, showing that the more prestigious the journal where the paper is published, the more probable that the claimed effect will shrink in the future.

        • As Edward Tufte put it, “In medical research, too often the first published study testing a new treatment provides the strongest evidence that will ever by found for that treatment…Years after the initial study, as the Evidence Decay Cycle plays out, sometimes the only remaining issue is whether the treatment is in fact harmful.”

  2. It seems Ed Yong was at American Society of Human Genetics meeting and presumably was able to attend the talk and/or talked with people who did.

    A crucial flaw of this study that has nothing to do with statistics is that they studied methylation of DNA samples from saliva. People in the epigenetics field say “One genome, many epigenomes.” It means that, even though cells in your body have identical genome (with the exception of B cells and T cells that undergo genetic rearrangements), you have many different cell types and different “epigenomes” associated with them. (I prefer not to use the terms like epigenetics and epigenome that often get abused. But I think you get the point.) Saliva contains mixture of cells and the most of such cells should have little relevance to social/sexual behavior. Why would you even expect that you can get anything informative from such a study?

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *