Skip to content

Contribute to this pubpeer discussion!

Alex Gamma writes:

I’d love to get feedback from you and / or the commenters on a behavioral economics / social neuroscience study from my university (Zürich). This would fit perfectly with yesterday’s “how to evaluate a paper” post. In fact, let’s have a little journal club, one with a twist!

The twist is that I’ve already posted a critique of the study on PubPeer and just now got a response by the authors. I’m preparing a response to the response, and here’s where I could use y’all’s input.

I don’t have the energy to read the paper or the discussions but I wanted to post it here as an example of this sort of post-publication review.

Gamma continues:

Here’s the abstract to the paper:

Goal-directed human behaviors are driven by motives. Motives are, however, purely mental constructs that are not directly observable. Here, we show that the brain’s functional network architecture captures information that predicts different motives behind the same altruistic act with high accuracy. In contrast, mere activity in these regions contains no information about motives. Empathy-based altruism is primarily characterized by a positive connectivity from the anterior cingulate cortex (ACC) to the anterior insula (AI), whereas reciprocity-based altruism additionally invokes strong positive connectivity from the AI to the ACC and even stronger positive connectivity from the AI to the ventral striatum. Moreover, predominantly selfish individuals show distinct functional architectures compared to altruists, and they only increase altruistic behavior in response to empathy inductions, but not reciprocity inductions.

The exchange on PubPeer so far is here, the paper is here (from first author’s ResearchGate page), the supplementary material here. (Email me at if you have trouble with any of these links.)

The basic set-up of the study is: 

• fMRI
• N=34 female subjects
• 3 conditions
• baseline (N=34)
• induced motive “empathy” (N=17; between-subject)
• induced motive “reciprocity” (N=17;  between-subject)
• ML prediction / classification of the two motives using SVM
• accuracy ~ 70%, stat. sign.

In my comment, the main criticism was that their presentation of the results was misleading by suggesting that the motives in question had been “read off” the brain directly without already knowing them by other means. I’ve since realized that it is mainly the title that suggests so and thereby creates a context within which one interprets the rest of the paper. Without the title, the paper would be more or less OK in this regard. In any case, to say in the title that brain data “reveals” human motives suggests (clearly, to me) that these motives were not previously known. That they were “hidden” and then uncovered by examining the brain. But obviously, the prediction algorithm had to be trained on prior knowledge of the motives, so that’s not at all what happens. This is one thing I intend to argue in my response. 

But there’s more.

In the comment, I’ve also raised issues about the prediction/machine learning aspects and I want to bring up more in my response to their reponse. These issues concern the purpose of prediction, the relationship between prediction and causal inference, generalizability, overfitting and the scope for forking paths. So lots of interesting stuff! And since I’m not an expert (not a statistician, but with not-too-technical exposure to ML), I’d love to get input from the knowledgeable crowd here on the blog.

Before I separate the issues into chunks, I’ll outline what I gathered they did with their data. As far as neuroimaging studies go, they used quite sophisticated modeling. Below, the dotted lines (—) are loosely used to indicate “is input to” or “leads to” or “produces as output”, or simply “is followed by”.

  1. fMRI (“brain activity”) — GLM (empathy vs reciprocity vs baseline) —  diff between two motives n.s., but diff betw. motives and baseline stat. sign. in a “network” of 3 brain areas — use of DCM (“Dynamic causal models”) to get “functional connectivity” in this network, separately for the 3 conditions
  2. DCM: uses time-series of fMRI activations to infer connectivity in the network of 3 brain areas — start w/ 28 plausible initial models, each a different combination of 7 network components (see Fig. 2A, p.1075, and Fig S2., p.12 of the supplement) —  use Bayesian model averaging to estimate parameters of the 7 components (components = strengths/direction of connections and external inputs) — end up with 14 “DCM-parameters” per subject, 7 per motive condition, 7 per baseline

  3. Prediction: compute diff between DCM-parameters of each motive vs baseline (1: emp – base; 2: rec – base) = dDCM parameters — input these into SVM to classify empathy vs reciprocity — LOOCV — classification weights for 7 dDCM params (Fig. 2B, p.1075)

  4. “Mechanistic models”: start again with 28 initial models from 2. — random-effect Bayesian model selection — average best models for each condition (emp – rec – base; Fig. 3, p.1076)

The paper is a mix of talk about the prediction aspect and the mechanistic insight into the neural basis of the two motives that supposedly can be gleaned from the data. There seems to be some confusion on the part of the authors as how these two aspects are related. Which leads to the first issue.

I. Purpose of prediction

In my comment, I questioned the usefulness of their prediction exercise (I called it a “predictive circus act”). I thought the causal modeling part (DCM) is OK because it could contribute to an understanding of what, and eventually how, brain processes generate mental states. However, I didn’t think the predictive part added anything to that. (And I couldn’t help noticing that the  predictive part would allow them advertize their findings as “the brain revealing motives” instead of just  “here’s what’s going on in the brain while we experience some motives”.)

What’s your take? Does the prediction per se have a role to play in such a context? 

II. Relationship between prediction and causal modeling/mechanistic insights

The authors claim that the predictive part supports or even furnishes the mechanistic (causal?) insight the data supposedly deliver, although that is not stated as the official purpose of the predictive part. They write: 

“We obtain these mechanistic insights because the inputs into the support vector machine are not merely brain activations but small brain models of how relevant brain regions interact with each other (i.e., functional neural architectures)…. And it is these models that deliver the mechanistic insights into brain function…”

The last sentence of the paper then reads: 

“Our study, therefore, also demonstrates how “mere prediction” and “insights into the mechanisms” that underlie psychological concepts (such as motives) can be simultaneously achieved if functional neural architectures are the inputs for the prediction.”

But if my outline of their analytic chain is correct, these statements are confused. As a matter of fact, they do *not* derive their mechanistic models (i.e. the specific connectivity parameters of the network of 3 brain areas, see Fig. 3 p.1076) from the predictive model. The mechanistic models are the result of a different analytic path than the predictive model. This can already be seen from the fact that the predictive model is based on *differences* between motive and baseline conditions, while the mechanistic models they discuss at length in the paper exist for each of these conditions separately. 

If all this is right, the authors misunderstand their own analysis. 
(They also have *this* sentence, which I consider a tautology: “Thus, by correctly predicting the induced motives, we simultaneously determine those mechanistic models of brain interaction that best predict the motives.”)

I would be happy, however, if someone found this interesting enough to check whether my understanding of the modeling procedure is correct. 

III. Generalizability

The authors make much of their use of LOOCV: 

“We predicted each subject’s induced motive with a classifier whose parameters were not influenced by that subject’s brain data… Instead, the parameters of the classifier were solely informed by other subjects’ brain data. This means that the motive-specific brain connectivity patterns are generalizable across subjects. The distinct and across-subject–generalizable neural representation of the different motives thus provides evidence for a distinct neurophysiological existence of motives.”

They do not address at all, however, the issue of generalizability to new samples (all the more important for a single-sex sample).  I thought the emphasis is completely wrong here. My understanding was and is that achieving a decent in-sample classification accuracy is only the smallest part of finding a robust classifier. The real test is the performance in new samples from new populations. Also, I felt that something was wrong with their particular emphasis on how cool it is that LOOCV leads to a classifier that generalizes within the sample. 

I wrote that “the authors’ appeal to generalizability is misleading. They emphasize that their predictive analysis is conducted using a particular technique (called leave-one out-cross-validation or LOOCV) to make sure the resulting classifier is “generalizable across subjects”. But that is rather trivial. LOOCV and its congeners are a standard feature of predictive models, and achieving a decent performance within a sample is nothing special.” 

In their response, they challenged this: 

“Well, if it is so easy to achieve a decent predictive performance, why do the behavioral changes in altruistic behavior induced by the empathy and the reciprocity motive enable only a very poor predictability of the underlying motives? On the basis of the motive-induced behavioral changes the classification accuracy of the support vector machine is only 41%, i.e., worse than chance. And if achieving decent predictive performance is so easy, why is it then impossible to predict better than chance based on brain activity levels for those network nodes for which brain connectivity is highly predictive (motive classification accuracy based on the level of brain activity = 55.2%, P = 0.3). The very fact that we show that brain activity levels are not predictive of the underlying motives means that we show – in our context – the limits of traditional classification analyses which predominantly feed the statistical machine with brain activity data.”

What they say certainly shows that you can’t get a good classifier out of just any features you have in the data. So in that sense my statement would be false. But what I had in mind was more along the lines that to find *some* good predictors among many features is nothing special. But is this true? And is it true for their particular study? This will come up again under forking paths later.

To get back to the bigger issue, was I right to assume that getting a decent classifier in a small sample is not even half the rent if you want to say something general about human beings?

(To be fair to the authors, they state in their response that it would be desirable, even “very exciting”, to be able to predict subjects’ motives out-of-sample.)

IV. Overfitting and forking paths

Finally, what is the scope for overfitting, noise mining and forking paths in this study? I would love to get some expert opinion on that. They had 17 subjects per motive condition. They first searched for stat. sign. differences in brain activity between the 3 conditions. What shows up is a network of 3 brain regions. They attached to it 7 connectivity parameters and tested 28 combinations of them (“models”). Bayesian model averaging yielded averages for the 7 parameters, per condition. Subtract baseline from motive parameters, feed the differences  into an SVM. 

Can you believe anything coming from such an analysis?

I hope and believe that this could also bring the familiar insights and excitement of a journal club that so many of you have professed their love for. And last not least, maybe the scientific audience of PubPeer could learn something, too.

I have no idea, but, again, I wanted to share this as an example of a post-publication review.


  1. Tom Passin says:

    Just the other day Andrew was talking about using ML results as input to a model. Here, they use model results as input to a ML system. Interesting reversal.

  2. Statsgirl says:

    I enjoy your blog, but the posts where you wrote someone else for paragraphs and just add “I don’t know” at the end are my least favorite genre. I come here for your insights, not other people’s ramblings. But I guess everyone deserves a day off…

  3. Alex Gamma says:

    Update: meanwhile, I’ve published a response to their response, which you’ll find at the link in Andrew’s post or here:

    One thing I forgot to say is that the article’s title is “The brain’s functional network architecture reveals human motives”. Quite misleading, if you already know (or assume you know) these motives beforehand and merely train an algorithm to reproduce what you already know.

  4. Aftab Chitalwala says:

    Isn’t this the opposite of what we would expect based on the usual meaning of the word selfish:

    Moreover, predominantly selfish individuals show distinct functional architectures compared to altruists, and they only increase altruistic behavior in response to empathy inductions, but not reciprocity inductions.

    I would think that a selfish individual would be more responsive to reciprocity (when they get something in return) than empathy (when they don’t). Or am I totally missing something?

  5. Matt Skaggs says:

    A postpub critique should be polite, substantive, articulate and concise. Alex is certainly articulate!

    I agree with Alex about the use of the word “predict.” If I were in an unfamiliar mountain range and a friend told me that the two mountains to my left were over 10,000 feet, it would be an odd use of language for me to point to a mountain to the right and say “I predict that that mountain is also over 10,000 feet.” But that is semantics and not particularly substantive. Is Alex hoping we will give him something substantive?

    Alex asked:

    “They had 17 subjects per motive condition. They first searched for stat. sign. differences in brain activity between the 3 conditions. What shows up is a network of 3 brain regions. They attached to it 7 connectivity parameters and tested 28 combinations of them (“models”). Bayesian model averaging yielded averages for the 7 parameters, per condition. Subtract baseline from motive parameters, feed the differences  into an SVM. 

    Can you believe anything coming from such an analysis?”

    I can believe that the authors thought that a really sophisticated mathematical treatment would make the paper seem more sophisticated. The potion doesn’t work on me, the unnecessary “model” stuff when making simple input/output comparisons (contrived situation/brain response) just looks like cover for noise mining.

Leave a Reply