Skip to content
 

Tutorial: The practical application of complicated statistical methods to fill up the scientific literature with confusing and irrelevant analyses

James Coyne pointed me with distress or annoyance to this new paper, “Tutorial: The Practical Application of Longitudinal Structural Equation Mediation Models in Clinical Trials,” by K. A. Goldsmith, D. P. MacKinnon, T. Chalder, P. D. White, M. Sharpe, and A. Pickles. This is the team behind the PACE trial for systemic exercise intolerance disease.

Some of the words in the abstract seem ok:

Repeated measurements allow the application of various types of longitudinal structural equation mediation models. These provide flexibility in modeling, including the ability to incorporate some types of measurement error and unmeasured confounding that can strengthen the robustness of findings. . . . In this tutorial, we outline how to fit several longitudinal mediation models, including simplex, latent growth and latent change models. . . .

The trouble is that these good intentions don’t do anything for you if the models don’t make sense. It would be as if I tried to build a nuclear reactor at home and wrote a fancy-sounding prospectus with a lot of fine phrases about atomic theory and the forces on the nucleus. It might do the job to get my article published in the Lancet but it wouldn’t actually be producing nuclear power.

What’s striking here is that:

1. They’re giving a tutorial in mediation models, but they don’t seem to understand these methods, or at least be able to explain them clearly. For example, the above figure from this paper just seems like bad news. I can’t imagine it ever being a good idea to do this sort of thing.

2. They still won’t share their data! From the abstract:

We use the Pacing, Graded Activity, and Cognitive Behavioral Therapy: A Randomized Evaluation (PACE) trial of rehabilitative treatments for chronic fatigue syndrome (ISRCTN 54285094) as a motivating example . . . The simulated data set and Mplus code and output are provided.

I asked Coyne if these researchers said there were some legal or confidentiality rules whereby they could not share their data, and he replied: Yes but that was soundly rejected by a tribunal. So I don’t know whassup with that.

Psychological Methods is a respected journal, I thought!

P.S. There’s some discussion in comments about whether I (and, I suppose, Coyne) are being unfair by saying how we don’t believe these methods, but without giving detailed criticism. Maybe Coyne and I are wrong; that’s certainly possible. But the real point is the science. It’s not about me, or Coyne, or even the authors of this paper; it’s about understanding the effects of these medical treatments. And, the thing is, I don’t see any reason to think that the methods of mediation analysis discussed above will answer the applied questions of interest here. Real people are suffering from systemic exercise intolerance disease, and I think the burden is on these researchers to demonstrate that these complicated and controversial procedures make sense. It’s not our job to untangle these methods.

55 Comments

  1. Matt Skaggs says:

    “I can’t imagine it ever being a good idea to do this sort of thing.”

    Probably not. But it is a common response to the “sophistication syndrome.” This is where a modest research goal is set in a churning sea of ignorance, and a simple model is constructed to achieve that goal. Well aware that even baseline knowledge of the basic relationships between the variables is lacking, you consider the simplicity of your model a feature, not a bug. Then your rivals publish a critique of your simple model, scorching you for being overly simplistic. Reviewers, not really understanding the relative merits of either argument, side with your rivals because…well…if they pick the simple approach, they might raise suspicion that they don’t really understand the sophisticated approach.

    The result is meaningless sophistication creep.

  2. Ram says:

    I’m not sure I understand what your complaint about the paper is.

    “The practical application of complicated statistical methods to fill up the scientific literature with confusing and irrelevant analyses”

    Confusing is perhaps in the eye of the beholder, but irrelevant? Always? Why do you think so?

    “The trouble is that these good intentions don’t do anything for you if the models don’t make sense. It would be as if I tried to build a nuclear reactor at home and wrote a fancy-sounding prospectus with a lot of fine phrases about atomic theory and the forces on the nucleus. It might do the job to get my article published in the Lancet but it wouldn’t actually be producing nuclear power.”

    What does this paragraph mean? Are you saying they advise people to use specific types of models even when they don’t make sense?

    “They’re giving a tutorial in mediation models, but they don’t seem to understand these methods, or at least be able to explain them clearly. For example, the above figure from this paper just seems like bad news. I can’t imagine it ever being a good idea to do this sort of thing.”

    What’s the evidence they don’t understand the methods, or are unable to explain them clearly? That they produced a figure you don’t care for?

    I haven’t read the paper, and don’t have a dog in this fight, but I can’t figure out why you take a negative view of it from your post.

    • Anon says:

      Exactly. This blog post is only detailed enough to understand there’s general dislike for these models and how people (or at least some people) are presenting them without much explanation or justification.

    • JFA says:

      Yeah… I don’t get the hostility… might just be because the authors are the same as the PACE study, and Andrew doesn’t like their previous behavior in terms of data sharing.

    • Andrew says:

      Ram, Jfa:

      Yes, I think the above figure is a disaster. This has nothing to do with data sharing. All the data could be open, the whole study could be on Github, whatever, I’d still think that all those p-values attached to arrows are mistakes, at best a waste of time and at worst actively misleading people. Cargo cult science, the whole bit. A phrase such as “flexibility in modeling, including the ability to incorporate some types of measurement error and unmeasured confounding that can strengthen the robustness of findings” sounds good, but in this case I think it’s pretty meaningless, it’s what Lakatos would call a degenerative research program.

      Just to be clear: I don’t consider the above paragraph, or the above post, to be a proof of said claim, or even an argument for it. I’m just expressing my view. I and others have written enough in various places about the problems with such mediation analyses.

      • Ram says:

        You’re entitled to your view, of course, and this is your blog, so no complaints from me on either front.

        But given that many, many scientists trust your takes, it seems you owe it to these authors to say more than mediation modeling is a degenerative research program. You can say, “I haven’t given the paper a close look, but here it is for those interested in this sort of thing. I’m personally skeptical of these types of methods, however.” Or you can say, “Here are specific critiques of *this* paper based on my close reading…” Instead, you seem to insult the authors and criticize the paper simply based on your view that the topic area is bunk, when you know this is hardly a point of consensus.

        Your blog, your takes–fair enough. But I’m inclined to think the authors deserve better.

        • Andrew says:

          Ram:

          You write, “the authors deserve better.” It’s not about them, or about me. The goal is to help people with systemic exercise intolerance disease or, more generally, to understand how this disease works and how it can be cured, or how its symptoms can be alleviated. I don’t think the methods described in the above-linked paper will help with that. That’s not an “insult” to anyone; it’s just the way it is. It’s not that I’m personally skeptical; it’s that these methods just can’t do what they’re sold as doing.

          I recognize that this is hardly a point of consensus—if it were a consensus, that article never would’ve been published in a serious journal! Unfortunately, there are various statistical methods out there that people use in order to get conclusions which can’t really be supported by their data. Various people have written about problems with mediation analysis, and I don’t have the energy to repeat all these points here. In the meantime, I can see the appeal of these methods, and I’m not trying to make a case against them here. If you want to be fair-minded, you can feel free to believe all the claims represented in the graph reproduced in the above post. I think that would be a mistake on your part—and you might ask yourself why you should think that this pile of numbers is really telling you about causal effects—but it’s your call.

          • Ram says:

            To be clear, the insult I was referring to was the claim that they don’t understand the methods they’ve written a tutorial about. Also, I’m not suggesting it’s “unfair-minded” to not believe the claims of mediation analysis. I’m simply saying if you write a post insulting *particular* authors about a *particular* paper they’ve writen, they deserve more than a general dismissal of the topic area they’ve written about.

            You may well be right about the methods as a general matter, but I think that’s not really relevant here, given that this post is about a particular paper by particular authors. Lots of people who may have otherwise gone into this paper with an open mind may not do so now, since they trust your take, and if the paper does contain some useful ideas or points, then that would represent a setback for the beneficiaries of good research, such as the patients you mentioned.

      • Is it the p values attached to the arrows that’s getting you, or the general concept of mediation analysis?

        If your main concern is that people are taking a complex idea and then trying to convert it into a certainty factor (p = 0.02 so mediating effect ABC is real and nearly equal to the estimated quantity given in chart Q) then I’m totally with you on that. Regardless of what you think about the idea of mediation and modeling it… doing an analysis by converting things into “real because p is less than 0.05” and “not real because p is greater than 0.05” is a disaster, everywhere it’s done… which is basically everywhere.

        • Andrew says:

          Daniel:

          It’s both things. I don’t think the model underlying the mediation analysis makes sense. In addition, summarizing by p-values is indeed a disaster. The whole thing looks to me like a bunch of tools designed to answer some questions which, though important, are essentially unanswerable from the data at hand. And this connects to some of the larger concerns about statistically-based scientific research, that large and certain-sounding claims are made from indirect data.

          • To me the whole issue comes about because people in these fields aren’t thinking about models of mechanism typically until they fail to get the p value they want in the bog-standard stupid direct regression of Z vs X. they’re just “measuring differences” typically rather than thinking mechanistically, and then suddenly someone comes along and says “hey we could actually think about how this happens and here are some diagrams to help you” and people bolt on to the idea of “thinking about how this happens” all the baloney baggage of p values and real effects vs random noise and etc etc that came along with their standard practice, and it becomes a deus-ex-machina for saving their poorly thought out research program.

            If someone came to you and said “hey, I think when we treat people with treatment X it causes Y and then both X and Y have an effect on Z, and if you made Y happen some other way you’d still see an effect on Z… how could I possibly study that and what kind of data analysis can I do to account for this?” I suspect if they came to you as a statistical consultant with that reasonably well spelled out hypothesis, you’d be happy to tell them how to build an experiment that would test this, and a model that would estimate the effects… etc. you may or may not use the same kind of diagrams.

            On the other hand, if someone comes to you with a batch of data they’ve collected, and says “gee the measurements are noisy, and we ran the regress-o-matic on it and couldn’t find an effect that made sense to us, but when we thought about it, maybe there could be some thing kinda affecting the outcome called Y and when we include Y the regress-o-matic goes “ding” and spits out a coefficient with 3 stars, and look the p values were tiny so it must be true” you’d tell them to go home and soak their head.

            It’s not the complicated mediational effects in my opinion, it’s that they’re ad-hoc created *for the purpose of rescuing a failed research method* and then the small p values are taken as evidence of TRUTH.

            in the absence of that baggage, if people want to think hard about complicated feedback mechanisms… I say go for it.

            • Chris Wilson says:

              “It’s not the complicated mediational effects in my opinion, it’s that they’re ad-hoc created *for the purpose of rescuing a failed research method* and then the small p values are taken as evidence of TRUTH.”

              Indeed, as Andrew said – using the language of Lakatos – this cycle is a major symptom of a degenerative research program. Increasingly, I have realized that most criticism of bad stat practices (whether NHST/p-values, sloppy data mining, or naive Bayesian inference, whatever), to the extent that it is well-founded, is really just pointing out degenerative research programs. It’s not really the statistical methods themselves, rather it is their propensity for attaching to, or sometimes providing quantitative cover for, what is basically pseudo-science.

            • ojm says:

              Yeah the general diagrams like

              A to M
              M to B
              A to B

              (lazily trying to avoid HTML errors)

              seem mostly fine to me.

              (though it seems interestingly subtle to think what the individual arrows are intended to mean in this context eg in terms of defining ‘direct’ vs ‘indirect’ effects. )

              The pvalues and things though – you can leave those out, thanks.

      • Martha (Smith) says:

        ” I’d still think that all those p-values attached to arrows are mistakes, at best a waste of time and at worst actively misleading people. … A phrase such as “flexibility in modeling, including the ability to incorporate some types of measurement error and unmeasured confounding that can strengthen the robustness of findings” sounds good, but in this case I think it’s pretty meaningless”

        I’m no expert on mediation analysis, but from my semi-naive point of view, I think that if one wants to “incorporate some types of measurement error and unmeasured confounding”, then some type of uncertainty quantification would be far better than using p-values (since this situation appears to be a “garden of forking paths” one).

        • Martha (Smith) says:

          And/or “since this situation appears to be one where uncertainty (as in measurement error and unmeasured confounding) is propagated”

        • Andrew says:

          Martha:

          I think one problem is that the users of these methods are typically not aiming to quantify their uncertainty; rather, they want results that can be published with the air of certainty. I’m not being cynical here and talking about corruption. I think this is just how they’ve been trained to think of science, as a method for gathering data and routinely learning general truths. If you read journal articles, that’s what science can look like.

          • Martha (Smith) says:

            I agree. Some of my favorite quotes on the subject of uncertainty (in the hope that they may help convince some readers that do not realize that you are not cynical or talking about corruption):

            “If it involves statistical inference, it involves uncertainty.” (Me — said often to students)

            Statistics is the “Science of Uncertainty”, (Noel Cressie and Christopher K. Wikle

            “Statistics, in a nutshell, is a discipline that studies the best ways of dealing with randomness, or more precisely and broadly, variation. As human beings, we tend to love information, but we hate uncertainty — especially when we need to make decisions. Information and uncertainty, however, are actually two sides of the same coin. If I ask you to go to the airport to pick up a student you have never met, my description of her is information only because there are variations; if everyone at the airport looks identical, my description has no value. On the other hand, the same variation causes uncertainty. If all I tell you is to pick up a Chinese female student …, then my description is not informative enough because it still allows too many variations. There may be a substantial number of individuals at the airport who look like a Chinese female student.” (Xiao-Li Meng)

            “.… to deal with uncertainty successfully we must have a kind of tentative humility. We need a lack of hubris to allow us to see data and let them generate, in combination with what we already know, multiple alternative working hypotheses. These hypotheses are then modified as new data arrive. The sort of humility required was well described by the famous Princeton chemist Hubert N. Alyea, who once told his class, “I say not that it is, but that it seems to be; as it now seems to me to seem to be.” (Howard Wainer)

  3. sentinel chicken says:

    “They’re giving a tutorial in mediation models, but they don’t seem to understand these methods, or at least be able to explain them clearly.”

    David MacKinnon, the second author, has written extensively on mediation, but the implication of your comments seems to be that we should trust your judgement about mediation over his? You’ve spent how many years studying and and writing about mediation exactly?

    Mediation is a subtle concept that many researchers struggle to fully understand. Your comments say nothing, and say even less about how much you actually understand mediation, conceptually or statistically. Maybe you’re the one who doesn’t really understand it. How would we know? Can you offer something more than lazy criticism like, ‘I can’t imagine it ever being a good idea to do this sort of thing.’ Can you explain why you can’t imagine this ever being a good idea and offering some thoughts on what you would do instead?

    • Andrew says:

      Sentinel:

      I’m not claiming to understand mediation methods. But I’m not publishing a tutorial on them either. The figure reproduced above is not a good sign, no matter how many years went into its preparation.

      Regarding what I would do instead, see chapters 9 and 10 of my book with Jennifer Hill, also here, where I say that if I’m interested in many mediators, I’d want to do many different experiments or observational studies: one experiment or observational study for each causal effect of interest.

      • Wow I touched on to the causality issue myself in reference to an article tweeted on Brexit

        https://twitter.com/JadePinkSameera/status/1014931401244848129

        • Thanatos Savehn says:

          Speaking of causality, I don’t know if you’re following him yet (I saw you were following more than 11,000 peeps – Thanatos’ mind is boggled – so I didn’t check to see) but Judea Pearl is tweeting now. Very limited (and tame) so far and good stuff on RCTs done wrong and not even rightly doable, regular reminders of Nancy Cartwright’s “no causation in, no causation out” quote for the ages, etc. At this point I’m waiting for the fireworks to begin which I predict will commence upon Philip Dawid tweeting “you’ll never get p(H-true | E-some) from p(E-deduced | H-false)” at him. They haven’t settled the debate in long form so maybe reducing their arguments to 280 characters will do the trick. :/

          • Thanatos,

            Thank you. I haven’t yet watched a Philip Dawid YouTube presentation. I watched Judea Pearl and several other speakers presenting at the Judea Pearl Symposium.

            You referring to my Twitter? I typically follow anyone who follows me.

            I did not know Judea Pearl was tweeting. I do not pretend to be some super expert on causality. However I was stunned to watch one Judea Pearl Youtube video and find that we share some common observations about who may be better to apprehend causality. Of course wide ranging childhood education and engagement are two dimensions.

            Some of the arguments presented are beyond my current expertise. But I am capable of extreme focus if the subject interests. me.

            • Thanatos Savehn says:

              Speaking of YouTube, I miss the early days back when I was trying to understand the Bayesian perspective and why p-values as evidence for/against causation had never made sense to me. The videos that finally helped me (kinda/sorta in my limited way) get it were made by some Russian or Ukrainian guy who’d draped sheets over what looked like his kids’ bunk bed to make a studio inside which he’d sit as he explained it all (in passable English) to me. My wife passing by once asked if it was a hostage video.

              Anyway, it was much better than the over-produced TED-ish talks typical these days. He’d get straight to the point, had few graphics but many useful examples, and didn’t waste time on STDNM. Not many people are good at identifying those circumstances when 10 minutes of listening might be more enlightening than 2 minutes of reading.

              P.S. On causation I find Miguel Hernan to be particularly helpful in translating some of Pearl’s arguments.

              • Thanatos,

                It would be very interesting to evaluate the quality/accuracy of current statistics instruction videos.

                I, for example, easily understood the concerns about statistical significance testing, more generally. But when it came to definition of p-value, I was confused. All in all I found Rex Kline’s video very good as a general introduction to statistics significance testing.

                https://vimeo.com/114891922

                I see quite a few references to Miguel Hernan. I will access his videos. Thank you.

                My 1st intros to the controversies in statistics were through John Ioannidis, Daniel Lakens, Stephen Senn, Sander Greenland, Steven Goodman. Since then I have naturally followed Andrew Gelman, Frank Harrell, Deborah Mayo, Richard Morey, Keith O’Rourke, and others posting to Andrew’s blog.

      • Anon says:

        “I’m not claiming to understand mediation methods.”

        Shouldn’t understanding a method be a prerequisite for criticizing it?

        • Andrew says:

          Anon:

          It takes a lot less understanding to see that something doesn’t do what it claims to do, than it does to build something that does.

        • yyw says:

          You don’t need to know the technical details of mediation or path analysis or SEM to understand what the model means. Is this a compelling model well supported by established theory? What’s the causal interpretation for the direct path? It’s probably my bias, but I am suspicious of models using intermediate outcomes as predictors since a counterfactual interpretation seems impossible.

          Even if you believe in this sort of mediation analysis, is fear avoidance a good mediator here? If you look at the coefficients for GET vs APT on physical function, the direct path and the indirect path have effects in the opposite directions. I guess this could plausibly happen in reality but it is more likely a type S error (or a typo?).

          • Andrew says:

            Yyw:

            Yes. Real people are suffering from systemic exercise intolerance disease, and I think the burden is on these researchers to demonstrate that these complicated and controversial procedures make sense. It’s not our job to untangle these methods.

          • Luigi Leone says:

            Direct and indirect effects with opposite signs could be found, and might make sense. A clear example was given by Bollen in his beloved (by me) book “Structural equation models with latent variables” (1989). Imagine the association between intelligence and errors in a dull task. You obtain a correlation (total effect) of 0. However, when you include a mediator (boredom proneness), you obtain a negative direct effect of intelligence, which says that the smarter you are the less errors you make (seems to make sense); however, intelligence is correlated POSITIVELY with boredom proneness, which in turn relates POSITIVELY with errors (the more bored you are, the less attentive to the dull task and therefore you make more errors). Thus, you would have a positive indirect effect (I would prefer to say association instead than effect, though) of intelligence on errors, and a negative direct effect of intelligence on errors. No need for typos, no need for S errors (which of course could occur nonetheless).
            I apologize for my bad English.

            • Luigi

              Those observations sound kinda iffy to me. One can be superbly attentive and err, if one is confronted with a weak theory, measurement, and practice.

              BTW, ahemmmmm it’s ‘fewer’ errors. From moi the Mistress of the Parenthesis. lol

  4. Marius says:

    These kinds of mediation analyses are quite popular in my field (you can probably guess what that might be), and I’ve been asked to look into them for some future projects. I have a vague sense of distrust about them, but I haven’t come across any particular good overviews of the problems with them – does any have a good reference they would recommend?

  5. Pat says:

    I wonder what Judea Pearl thinks about this conversation, interested in getting his perspective.

  6. Anoneuoid says:

    I just looked at this… Emphasis mine:

    [from supplement:]
    The data were simulated using the Mplus MONTECARLO command, with parameter estimates obtained from fitting a model to PACE data standardized to baseline values of the mediator and outcome. The parameters from PACE were modified and then used as population and coverage values in the MONTECARLO command. The parameters used are shown in Figure S2. Two datasets of n = 320 were simulated using the modified latent change score model (presented in the manuscript, used because it had the best fit to the real trial data), each with a binary treatment group variable. The two datasets were then merged to form a dataset with four treatment groups.

    The simulated dataset is called longitudinal mediation.dat. The dataset contains the following variables: y0, y1, y2, y3, m0, m1, m2, m3, r1, r2, r3, and r4 in that order. These variables are capitalized in the article and in some of the text in the supplement, but are lower case in the data set, the Mplus code, and references to the code. The Y variables represent measures of the outcome at baseline (Y0), and at three time points post-randomization, with the M variables representing mediator measurements taken at the same time points. The four treatment group variables R1, R2, R3, and R4 are dummy coded variables coded = 1 for membership in that randomized group and = 0 otherwise. These variables are described in more detail in the body of the manuscript.

    [from paper:]

    Data were simulated using Mplus, with parameters from a model fitted to the PACE data that were modified before being used in the simulation algorithm (Figure S2 of the online supplemental materials). The data were simulated based on PACE but are not actual PACE data. The data set contains the binary 0, 1 coded variables R1, R2, R3, and R4, which represent four treatment groups; four mediator measurements, M0 at baseline and three postrandomization time points, M1, M2, and M3; and corresponding outcome measures, Y0, Y1, Y2, and Y3 (the variable names are lower case in the data set and Mplus code). The R1 – R4 variables broadly correspond to the CBT (R1), APT (R2), and GET (R3), and SMC control treatment (R4). R1 and R3 represent active treatments, with R2 as an example of a treatment that differs little from the control. The simulated mediator data was based on a measure of fear avoidance (Knudsen, Henderson, Harvey, & Chalder, 2011; Moss-Morris & Chalder, 2003; Skerrett & Moss-Morris, 2006), which is scored as higher is worse. The simulated outcome data was based on the physical functioning outcome (Buchwald, Pearlman, Umali, Schmaling, & Katon, 1996; McHorney, Ware, & Raczek, 1993), which is scored as higher is better. These simulated data and further detail on the simulation have been provided in the online supplemental materials. We have also provided a plot of the observed means over time in the PACE data in Figure S3 in the supplemental material to aid later discussion of trajectories of change.

    The data used to generate the parameters used for simulation were standardized to baseline for the mediator and the outcome. We will leave whether and how to standardize to the discretion of the reader, as there is some disagreement on this subject in the literature (Baguley, 2009; Cheung, 2009). Standardizing gives effects that are arguably in more interpretable units of measurement, that is, standard deviation (SD) units, compared with the original scales of the measures (Cheung, 2009; MacKinnon, 2008). Standardization in mediation helps with the use of different scales for mediator and outcome, and potentially allows for cross-measure comparisons. Using baseline values to standardize provides measurement units that are retained across time, scaled independently of the treatment receipt effects, and indirect effects expressed in units of baseline SD of the outcome (i.e., the SD of the mediator cancels out when calculating the indirect a b effect). Standardization may be particularly important for the indirect effect, which, as a product of two estimates, may be challenging to interpret if calculated using the original variable scales (Cheung, 2009; MacKinnon, 2008; Preacher & Kelley, 2011).

    I really cant tell what any of these numbers in the diagrams mean. Its all so abstracted away from the actual measurements, which I’m sure are in turn only loosely related to what they want to measure. What are we supposed to do with the coefficient values in eg figure 1? For example, coefficient b_L = -.05 [-.10, -.001]? I guess this is supposed to be the lagged effect of the “true latent mediator” at time 1 and 2 on the “true latent outcome” at times 2 and 3. Ie, the standardized/etc lagged effect of “fear avoidance” on “physical functioning”. What do I do with this?

    Also, can anyone who uses m-plus check whether changing these models in slight, arbitrary ways affects the coefficients? Does coding the “treatment group variable” R1 as 0 for membership and 1 for not membership (reverse what they currently used) change these results? What if you switch to using -0.5, 0.5 rather than 0, 1?

  7. Z says:

    Andrew, your attitude toward mediation reminds me of your attitude toward instrumental variables. In both cases, there are completely rigorous explanations of which counterfactual estimands can be estimated using which estimators and under which assumptions. It’s totally fine to say, “people tend to use these methods even when the assumptions aren’t met”, or “in this analysis I don’t believe the necessary assumptions hold”. But you say (paraphrasing), “these methods generally don’t make sense”, which is really a mathematically false statement.

    • Andrew says:

      Z:

      The graph shown above involves about 20 assumptions, approximately 0 of which have been seriously considered. The model is being used to estimate things that just can’t realistically be estimated from the available data. If someone wants to take a serious model and add data to it, that’s fine—that’s what is done, for example, in pharmacometrics. There’s a big difference between (a) taking a model that makes some sense and then seeing how much more can be learned using some data, and (b) taking some data and then putting it into a model that has no justification, other than that it can be used to make quasi-certain statements about questions of interest.

      Also, no, I don’t say that instrumental variables methods generally don’t make sense, nor do I say the equivalent of that, nor do I think that. Take a look, for example, at the discussion of instrumental variables in my book with Jennifer Hill. I think that the idea of an instrumental variable is, in a certain way, fundamental to causal inference.

      • Ram says:

        Suppose we have a large, simple random sample from our population of interest, which we randomize into treatment v. control. Assume we have no missing data, and everyone complies with their treatment assignment. Now suppose we measure outcome 1 (binary) at time T, and outcome 2 at time T + dt for every subject. The mean difference between groups in outcome 1 is positive, and precisely estimated (assessed using a nonparametric bootstrap CI). The mean difference between groups in outcome 2 is also positive and precisely estimated (b1). Now we (linearly) regress outcome 2 on group and outcome 1. The coefficient on outcome 1 is positive and precisely estimated, and the coefficient on group is also positive and precisely estimated (b2), but smaller than b1 (and b1 – b2 is also precisely estimated).

        This may not be your preferred design or analysis plan, and perhaps you think such a clean setup is not realistic in the problems you work on, but suppose results like this were presented to you. What would be your interpretation of these results? Note that the only “modeling” assumption is no interaction between group and outcome 1 in determining the conditional expectation of outcome 2, and the CIs give approximately correct coverage regardless of the distribution of the errors.

        • yyw says:

          Ram,

          Do you assume the link between outcome 1 and outcome 2 to be causal?

          • Ram says:

            What I’m getting at is which assumptions (causal and/or non-causal) need to be introduced to this setup to give these results the usual mediation interpretation. Presumably the assumption you mention is one of these. But the scenario I’m describing is agnostic about the truth of any such assumptions. Someone does a mediation analysis in a setting where most of the usual statistical issues can be safely put to one side, and the question is how should we interpret it under different assumptions one might make.

            • yyw says:

              Without the causality assumption, I don’t think the direct effect b2 is interpretable in general even in the simple scenario you described.

            • yyw says:

              Even with causal link established between outcome 1 and outcome 2, additional assumptions would be needed to make b2 interpretable. Suppose outcome 1 is a desired action that is known to improve the eventual outcome (2), direct effect is comparing the population that take the action after intervention and those that take the action without intervention and comparing the population that do not take the action even after intervention versus those that don’t take the action in control. If the intervention has an effect on outcome 1, then direct effect estimation in general is comparing populations with different characteristics.

        • Anoneuoid says:

          Lets say a group eats a big meal, then water fasts for a few days. You meeasure “hunger” and record any “bowel movements”, at a few timepoints up to 72 hours later. A second control group just does a water fast.

          You will first see hungriness decrease (relative to the controls) after eating (iirc, this was proven in one of the wansink papers), and then later see “”number of bowel movements” increase relative to the controls. Further, smaller hunger decreases are linked to the bowel movements appearing sooner after the meal.

          Did the decrease in hungriness cause the bowel movements?

          • Glyrblör the cloud of blood says:

            I’m not sure if that important result has been in any paper by Wansink, but something like that was was in a paper which was discussed on this blog some time ago:

            https://andrewgelman.com/2016/07/08/29495/

            But maybe indeed this result has been independently replicated by other groups. You can’t hold back the ever-growing accumulation of scientific knowledge.

            • Anoneuoid says:

              Thanks, pretty sure that is what I was thinking of. Anyway, I hope the errors in my post didnt make it too confusing. The point was that eating relieved hunger and caused gastrocolic reflex which results in a lagged bowel movement.

              This is in no way evidence for relieved hunger causing bowel movements. Although I would not at all be surprised that eg relieving hunger in the absence of food does affect bowel movements in some way, there is no reason to think whatever coefficient is reported by this mediation analysis would reflect the strength/direction of that relationship.

              So right now this mediation analysis seems *really, really bad* to me. I only looked at that one paper about it so take it for what its worth.

              • ojm says:

                I’m no expert here but it sounds like what you’re describing is eating as a confounder of hunger -> bowel movements, not a mediator.

                To establish mediation requires, I believe, something like showing artificially increasing hunger while holding eating constant affects bowel movement.

                I can buy Andrew’s point that complex networks not really based on real theory are pretty meaningless, especially when combined with significance testing, but I imagine these methods aren’t completely terrible in a simple three variable setting.

  8. Anonymous says:

    Some of this discussion, especially Andrew’s response to Z above, reminds me of this article from Sander Greenland

    Greenland, S., 2010. Overthrowing the tyranny of null hypotheses hidden in causal diagrams. Heuristics, probability and causality: A tribute to Judea Pearl, pp.365-382.

    A pdf is linked here:
    https://scholar.google.com/scholar?hl=en&as_sdt=0%2C20&q=Overthrowing+the+tyranny+of+null+hypotheses+hidden+in+causal+diagrams&btnG=

  9. Anoneuoid says:

    ojm wrote

    I’m no expert here but it sounds like what you’re describing is eating as a confounder of hunger -> bowel movements, not a mediator.

    To establish mediation requires, I believe, something like showing artificially increasing hunger while holding eating constant affects bowel movement.

    I can buy Andrew’s point that complex networks not really based on real theory are pretty meaningless, especially when combined with significance testing, but I imagine these methods aren’t completely terrible in a simple three variable setting.

    I tried to mimic the PACE trial scenario:

    R = eating or not = treatment groups
    M = feelings of hunger = fear avoidance
    Y = bowel movements = physical functioning

    Can you explain how the procedure there is different from the study I described?

    • ojm says:

      Honestly, I’m a bit drunk. They might have done it wrong, but it seems like they could do it right.

      • ojm says:

        I assume they do something like compute an effect of the mediator, holding the treatment fixed and the influence of the treatment on the mediator and then multiply these together or whatever to get the indirect/mediated effect path (which is only valid in a linear model framework, but still)

Leave a Reply