Skip to content
 

“However noble the goal, research findings should be reported accurately. Distortion of results often occurs not in the data presented but . . . in the abstract, discussion, secondary literature and press releases. Such distortion can lead to unsupported beliefs about what works for obesity treatment and prevention. Such unsupported beliefs may in turn adversely affect future research efforts and the decisions of lawmakers, clinicians and public health leaders.”

David Allison points us to this article by Bryan McComb, Alexis Frazier-Wood, John Dawson, and himself, “Drawing conclusions from within-group comparisons and selected subsets of data leads to unsubstantiated conclusions.” It’s a letter to the editor for the Australian and New Zealand Journal of Public Health, and it begins:

[In the paper, “School-based systems change for obesity prevention in adolescents: Outcomes of the Australian Capital Territory ‘It’s Your Move!’”] Malakellis et al. conducted an ambitious quasi-experimental evaluation of “multiple initiatives at [the] individual, community, and school policy level to support healthier nutrition and physical activity” among children.1 In the Abstract they concluded, “There was some evidence of effectiveness of the systems approach to preventing obesity among adolescents” and cited implications for public health as follows: “These findings demonstrate that the use of systems methods can be effective on a small scale.” Given the importance of reducing childhood obesity, news of an effective program is welcome. Unfortunately, the data and analyses do not support the conclusions.

And it continues with the following sections:

Why within-group testing is misleading

Malakellis et al. reported a “significant decrease in the prevalence of overweight/obesity within the pooled intervention group (p<0.05) but not the pooled comparison group (NS) (Figure 2)”. This kind of analysis, known as differences in nominal significance (DINS) analysis, is “invalid, producing conclusions which are, potentially, highly misleading”. . . .

Why drawing conclusions from subsets of data selected on the basis of observed results is misleading

Ideally, all analyses would be clearly described as having been specified a priori or not, so that readers can best interpret the data. Despite reporting no significance for the overall association, Malakellis et al. highlighted the results of the subgroup analyses as a general effect overall. Further complicating matters, the total number of subgroup analyses were unclear. It is also uncertain whether the analyses were planned a priori or after the data were collected and viewed. . . . Other problems arise when subgroup analyses are unrestricted, which is a multiple comparisons issue. . . .

Spin can distort the scientific record and mislead the public

Although Malakellis et al. may have presented their data accurately, by including statements of effectiveness based on a within-group test instead of relying on the proper between-group test, the article did not represent the findings accurately. The goal of reducing childhood obesity is a noble one. . . . However noble the goal, research findings should be reported accurately. Distortion of results often occurs not in the data presented but, as in the current article, in the abstract, discussion, secondary literature and press releases. Such distortion can lead to unsupported beliefs about what works for obesity treatment and prevention. Such unsupported beliefs may in turn adversely affect future research efforts and the decisions of lawmakers, clinicians and public health leaders.

They conclude:

Considering the importance of providing both the scientific community and the public with accurate information to support policy decisions and future research, erroneous conclusions reported in the literature should be corrected. The stated conclusions of the article in question were not substantiated by the data and should be corrected.

Well put. The problems identified by McComb et al. should be familiar to regular readers of this blog, as they include the difference between significant and non-significant is not itself statistically significant, the garden of forking paths, and story time.

I particularly like this bit: “However noble the goal, research findings should be reported accurately.” That was one of the things that got tangled in discussions we’ve had of various low-quality psychology research. The research has noble goals. But I don’t think those goals are served by over-claiming and then minimizing the problems with those claims. You really have to go back to first principles. If the published research is wrong, it’s good to know that. And if the published research is weak, it’s good to know that too: it’s the nature of claims supported by weak evidence that they often don’t replicate.

Allison also pointed me to the authors’ response to their letter. The authors of the original paper are Mary Malakellis, Erin Hoare, Andrew Sanigorski, Nicholas Crooks, Steven Allender, Melanie Nichols, Boyd Swinburn, Cal Chikwendu, Paul Kelly, Solveig Petersen, and Lynne Millar, and they write:

The paper describes one of the first attempts to evaluate an obesity prevention intervention that was informed by systems thinking and deliberately addressed the complexity within each school setting. A quasi-experimental design was adopted, and the intervention design included the facility for each school to choose and adopt interventions that were specific to their school context and priorities. This, in turn, meant the expectation of differential behavioural effects was part of the initial design and therefore a comparison of outcomes by intervention school was warranted. . . . Because of the unique and adaptive nature of intervention within each school, and the different intervention priority in each school, there was an a priori expectation of differential results and we therefore investigated reports within schools’ changes.

This is fine. Interactions are important. You just have to recognize that estimates of interactions will be more variable than estimates of main effects, thus you can pretty much forget about establishing “statistical significance” or near-certainty about particular interactions.

Malakellis et al. continue:

Our conclusion used qualifying statements that there was “some evidence” of within-school changes but no interaction effect, and that the findings were “limited”.

Fair enough—if that’s what they really did.

Let’s check, going back to the original article. Here’s the abstract, in its entirety:

OBJECTIVE: The Australian Capital Territory ‘It’s Your Move!’ (ACT-IYM) was a three-year (2012-2014) systems intervention to prevent obesity among adolescents.

METHODS: The ACT-IYM project involved three intervention schools and three comparison schools and targeted secondary students aged 12-16 years. The intervention consisted of multiple initiatives at individual, community, and school policy level to support healthier nutrition and physical activity. Intervention school-specific objectives related to increasing active transport, increasing time spent physically active at school, and supporting mental wellbeing. Data were collected in 2012 and 2014 from 656 students. Anthropometric data were objectively measured and behavioural data self-reported.

RESULTS: Proportions of overweight or obesity were similar over time within the intervention (24.5% baseline and 22.8% follow-up) and comparison groups (31.8% baseline and 30.6% follow-up). Within schools, two of three the intervention schools showed a significant decrease in the prevalence of overweight and obesity (p<0.05).

CONCLUSIONS: There was some evidence of effectiveness of the systems approach to preventing obesity among adolescents. Implications for public health: The incorporation of systems thinking has been touted as the next stage in obesity prevention and public health more broadly. These findings demonstrate that the use of systems methods can be effective on a small scale.

After reading this, I’ll have to say, No, they did not sufficiently qualify their claims. Yes, their Results section clearly indicates that the treatment and comparison groups were not comparable and that there was no apparent main effects. But it’s inappropriate to pick out some subset of comparisons and label them as “p<0.05.” Multiple comparisons is real. My concern here is not “Type 1 errors” or “Type 2 errors” or “false rejections” or “retaining the null hypothesis.” My concern here is that from noisy data you’ll be able to see patterns, and there’s no reason to believe that these noisy patterns tell us anything beyond the people and measurements in this particular dataset.

And then in the conclusions, yes, they say “some evidence.” But then consider the final sentence of the abstract, which for convenience I’ll repeat here:

These findings demonstrate that the use of systems methods can be effective on a small scale.

No no NO NO NOOOOOOO!

I mean, sure, they got excited when they were writing their article and this sentence slipped in. Too bad, but such things happen. But then they were lucky enough to receive thoughtful comments from McComb et al., and this was their chance to re-evaluate, to take stock of the situation and correct their errors, if for no other reason than to help future researchers not be led astray. And did they do so? No, they didn’t. Instead they muddled the waters and concluded their response with, “While we grapple with intervention and evaluation of systems approaches to prevention, we are forced to use the methods available to us which are mainly based on very linear models.” Which completely misses the point that they overstated their results and made a claim not supported by their data. As McComb et al. put it, “The stated conclusions of the article in question were not substantiated by the data and should be corrected.” And the authors of the original paper, given the opportunity to make this correction, did not do so. This behavior does not surprise me, but it still makes me unhappy.

Who cares?

What’s the point here? A suboptimal statistical analysis and misleading summary appeared in an obscure journal published halfway around the world? (OK, not so obscure; I published there once.) That seems to fall into the “Someone is wrong on the internet” category.

No, my point is not to pick on some hapless authors of a paper in the Australian and New Zealand Journal of Public Health. I needed to check the original paper to make sure McComb et al. got it right, that’s all.

My point in sharing this story is to foreground this quote from McComb et al.:

However noble the goal, research findings should be reported accurately. Distortion of results often occurs not in the data presented but, as in the current article, in the abstract, discussion, secondary literature and press releases. Such distortion can lead to unsupported beliefs about what works for obesity treatment and prevention. Such unsupported beliefs may in turn adversely affect future research efforts and the decisions of lawmakers, clinicians and public health leaders.

This is a message of general importance. It seems to be pretty hopeless to get researcher to correct the errors they’ve made in published papers, but maybe this message will get out there to students and new researchers who can do better in the future.

???

Really, what’s up with people? Everyone was a student, once. And as a student you make mistakes: mistakes in class, mistakes on your homework, etc. What makes people think that, suddenly, once they graduate and have a job, that they can’t make serious mistakes in their work? What makes people think that, just because a paper has their name on it and happens to be published somewhere, that it can’t have a serious mistake? The whole thing frankly baffles me. I make mistakes, I put my work out there, and people point out errors that I’ve made. Why do so many researchers have problems doing the same? It’s baffling. I mean, sure, I guess I understand from a psychological perspective: people have their self-image, they can feel they have a lot to lose by admitting error, etc. But from a logical perspective, it makes no sense at all.

15 Comments

  1. jrc says:

    “It seems to be pretty hopeless to get researcher to correct the errors they’ve made in published papers, but maybe this message will get out there to students and new researchers who can do better in the future.”

    I think students tend to imitate the writing of their professors and the major figures in their field. And I see nothing happening now to interrupt the intergenerational transmission of ¡Hype! writing. Sure, you or I might make snarky comments on their work, tell them to tone down the text, show them how/where they are writing ¡Hype! and where they are writing clearly, humbly and with solid grounding… but they know that the minute they are competing in the academic world outside our sphere, they do better to write the way everyone else does. Its sorta like steroids and bike racing: once a critical mass of competitors is doing it, that’s just the nature of the sport. No matter how much you shame a Lance Armstrong or ban a Russian Olympic delegation, it is just part of the race now.

    You think students/athletes are gonna listen to the one sad-sack coach who tells them only hard work, discipline and talent matter…or will they take what they can from that coach, and then go see that other coach who sells steroids in the shower stalls after practice. Because that’s almost certainly optimal behavior if their objective function is “most prestigious academic career possible”. Now for the students with better values who want to optimize for “living a life worth living” or something like that…well, we got a chance with some of them. But then the prestige and the money and intangibles of position and rank (not to mention it is hard to write clearly and carefully)… it takes a special student these days to harm their own career in the interests of “truth and honesty in research reporting” when very few of the successful people around them do that. All I can do is give those students more time and attention and effort in advising and on the job market. That’s something, but I suspect even that isn’t enough to help the profession escape this cycle of results-doping.

    • Andrew says:

      Jrc:

      One difference between scientific research and bike racing is that in scientific research there’s no unambiguous ranking of winners and losers. Also, research has external goals, unlike racing where the only goal is the internal goal of winning.

      Also, I don’t the problem is just careerism. My guess is that the above-linked authors who don’t admit their errors are not just acting out of careerism; I’m guessing they somehow really think, or have convinced themselves to think, that they’ve made no mistakes.

      • jrc says:

        I agree. In the absence of careerism we’d still have the problems of narcissism and ignorance and those would lead to similar issues in writing. But I really do think we pass the careerism on to our students by perpetuating the ¡Hype!-cycle, and they internalize it from the world they see around them, and I don’t see much happening to change that. So my bike racing analogy was imperfect, but I still think you are being optimistic by thinking that the next generation will do better when what they see every day from the successful past generations is that ¡Hype! is the way to go.

        More optimistic me might see hope in places like this blog, and in watching those of my students who are willing to actually wrestle with research, learn what they can, figure out what they can’t learn from what they are doing, and try to be clear and careful and honest in their reporting. But this is a pretty small world, and more pessimistic me sees it as a drop in the bucket. When professional (tenure, publication) and personal (narcissism, laziness) incentives align against social ones (generating knowledge, teaching others)… well, then the social pressure has to be really strong. That does not really seem to be the case right now.

  2. Jordan Anaya says:

    Next time you talk to Allison maybe you can ask him about his good friend Wansink:
    https://www.youtube.com/watch?v=-_roZK6oKQo#t=36m30s

    • Andrew says:

      Jordan:

      It’s understandable that people defend their friends, but I disagree with what Allison said in that clip, and in particular I disagree with his characterization of your writing as “character assassination.” Allison analogizes your behavior to that of a professor violating a student’s privacy by talking in public about suspected cheating. The difference, I think, is that Wansink’s errors were in published papers; in addition, Wansink has described his work publicly with various clearly documented untruths. So I don’t think anything you’re doing is inappropriate.

      • anon says:

        Jordan’s title and first few paragraphs kinda set a certain tone. Not sure they’re helpful. Guess he’d say no one would read it without them.

        • Andrew says:

          Anon:

          It’s perfectly reasonable to say that Jordan could’ve written his article better, or that you would’ve made different choices in writing it, or that some of what he wrote is counterproductive for some audiences. I just don’t think what Jordan wrote is “character assassination” or anything like it. There are different ways to make a point, and Jordan did it the way he wanted, which is fine with me. I’m fine with other people expressing the same points in other ways.

        • Jordan Anaya says:

          I’d say I knew anon would say I would say that, but it’s hard to keep track of all the anonymous commenters on this blog.

    • anon says:

      Not that this is a competition, but there are many far worse cases than Wansink.

      e.g. Colin Campbell of “The China Study” (2006) fame.
      He’s far worse because of bias, self-contradiction, etc. Even sticking to the statistical aspects…
      People seem to think it’s awesome they studied 367 variables and found 8000 statistically significant correlations. How’s the p-hacking, huh?

      Campbell’s response to criticism is to attack people & boast about his book sales, etc.
      e.g. (written 2009):

      “I ran a relatively large experimental research laboratory for many years before the China Study and published our work in more than 300 peer-reviewed publications, mostly in top medical and scientific journals. Further, this research was generously funded almost entirely by NIH, which means that it was rigorously reviewed at several levels.

      I am not a fan of scientific research ‘proving’ anything, thus I really don’t understand the rather rambling discourse on this thread about what constitutes ‘proof’. If that is the standard to be followed in science, we mostly will end up with a huge collection of narrowly focused observations that create more confusion than clarification. Rather, my standard is to do research from a wide vantage point, so as to gain perspective. Indeed, this is essential if one is ever to truly understand biology, especially nutrition.

      Why didn’t she also note those who have praised the book like a Nobel laureate and Vice Provost at Cornell and a very distinguished, 17-year President of Cornell.

      Since publishing the book over 4 years ago, it continues to sell at a rate even higher than it did in the beginning, now almost 4 times a national best seller. I have given almost 300 lectures, mostly to medical and professional venues and conferences and the feedback that I get is rather overwhelming. People are telling me how they respond in so many ways, involving all kinds of disease outcomes and health enhancements. For any doubters, read the 500+ unsolicited reviews on Amazon.”

    • Carol says:

      Speaking of David Allison: See today’s RETRACTION WATCH.

      Also of interest is Susan Fiske’s description of the content analysis that she is doing of “methods blogs.”

      https://www.youtube.com/watch?v=8ykftugZ44Y&list=PLGJm1x3XQeK0FeRdgKcyvyBH8TKUBtwFv&index=12&t=143s

      This talk is from the same colloquium in which David Allison gave the talk in which he described Jordan as a “character assassin.”

      Carol

Leave a Reply