Skip to content

“Men with large testicles”

Above is the title of an email I received from Marcel van Assen. We were having a discussion of PPNAS papers—I was relating my frustration about Case and Deaton’s response to my letter with Auerbach on age adjustment in mortality trends—and Assen wrote:

We also commented on a paper in PNAS. The original paper was by Fanelli & Ioannidis on “US studies may overestimate effect sizes in softer research.”

Their analyses smelled, with a rather obscure “to the power .25” of effect size. We re-analyzed their data in a more obvious way, and found nothing.

We were also amazed that the original paper’s authors waved away our re-analyses.

The most interesting PNAS article I [Assen] read last years is the one with Riling as one of the authors, arguing that men with bigger balls take less care of their children than men with smaller balls (yes, I mean testicles).

It had some p-values very close to .05. At that time I requested their data. I got the data, together with some requests not so share them. I did not follow upon this paper… Recently, I noticed Riling is also the author of at least one of the neuropsychology papers with huge correlations between a behavioral measure and neural activity.

OK, and here’s the promised cat picture.

Cute, huh?


  1. Z says:

    Looks like that cat takes bad care of its kids

  2. anon says:

    The balls effect estimates might be sensitive to sample restrictions. From the paper:

    “One participant’s testes volume measurement was excluded because his value was 2.8 SDs above the mean (mean= 38,064; SD=11,183) and was more than 13,000 mm^3 larger than any recorded value found in the literature”

  3. Jonathan (another one) says:

    Is that average testicle size or aggregate testicle size? What to do about the monorchid?

  4. Keith O'Rourke says:

    Well mine are not big enough to think I could get a robust analysis in an area like this.

    Within an meta-analysis, if there is a real concern about publication bias, there isn’t an obvious analysis to do and the last time I looked into this sensitivity analyses like John Copas’ seem most appropriate.

    For a meta-meta-analysis assuming the form of publication bias is common over meta-analysis – is likely a poor assumption.

    Also, if publication bias is worse among US authored papers – why remove its effect on over-estimation?

    Now scaling likely is important as the underlying data is binary (odds ratios?) or made to be binary and in some studies in some meta-analysis very sparse?

    If I had to do something here, I would try to get some ranking with meta-analysis and then try to pool those rankings in some thoughtful way to get at general pattern.

    From a quick look at “original paper’s authors waved away our re-analyses” I think they are raising some real issues that need to be thought through. Their “analyzed primary studies as a population of deviation scores, nested by meta-analysis” seems reasonable though maybe not drastic enough (ranking throws away more _information_ that in a context like this is most likely misleading).

  5. Shravan says:

    This is a well-known effect that is widely discussed in father mailing lists online. The chafing that happens due to unusually large testicles is very distracting, noisy and painful, so that the father spends all his mental energy on trying to keep his legs apart at all times. This is so exhausting that it leaves very little energy for looking after the child.

  6. cugrad says:

    Looks like the cat has large hands, too. In all seriousness, I am surprised PNAS approved the large balls paper.

  7. Simon Gates says:

    Bizarre that the Fanelli paper should fall into some of the traps that it is warning about – they explicitly mention “researcher degrees of freedom” but have plenty of them themselves, not to mention the “p<0.05 and I'm outta here" (copyright A. Gelman) attitude. And that's before we start arguing about the specific analyses. Surprising from John Ioannidis, but then I guess it's terribbly easy to get dragged into things that aren't great. I've just had a paper published where I thought the analysis was total crap (I didn't do it) – I thought about taking my name off it but didn't in the end. Maybe I should have done. I'm tempted to offer it for dissection here!

    • Keith O'Rourke says:


      I have mostly read the original paper by Fanelli & Ioannidis and methods were well thought out (the splitting of outcomes into magnitude and direction and fitting a random intercept for within meta-analysis magnitude was clever).

      I am fully convinced that there is not an obvious way to analyze data like this.

      I also am not convinced the findings are robust and worry the putative effects could be due conversion of effects to odds ratio metric, use of inverse variance weighted methods rather than actual likelihoods (they noted an unweighted analysis did not support the findings), lack of model fit, etc.

      Or maybe I have not spent enough time and effort to rule out these concerns…

      • Keith O'Rourke says:

        Not sure anyone is still interested in the Fanelli & Ioannidis but I recalled one of the subtler issues in meta-analysis.

        If you send a set of samples out to half a dozen or so labs, most of which are considered competent, the lab(s) with discrepant results likely did something wrong.

        In much of research, doing good studies is really really hard and most researchers are underfunded, under-experienced, under trained and over pressured to get closure (results). At least this what most of the attempts to measure study quality find – most studies are of low quality. Unfortunately attempts to adjust for study quality have not been successful (my last kick at the can ) so the summary measure would be better taken as the majority view of the (low quality) pack rather than the best estimate of the true effect.

        Now what Fanelli & Ioannidis are trying to model are absolute residual from the pack and the minimum negative residual or max positive residual from the theory. The first could actually be a marker of better study quality (different from the pack of low quality studies.)

Leave a Reply