Acupuncture paradox update

The acupuncture paradox, as we discussed earlier, is:

The scientific consensus appears to be that, to the extent that acupuncture makes people feel better, it is through relaxing the patient, also the acupuncturist might help in other ways, encouraging the patient to focus on his or her lifestyle.

But whenever I discuss the topic with any Chinese friend, they assure me that acupuncture is real. Real real. Not “yeah, it works by calming people” real or “patients respond to a doctor who actually cares about them” real. Real real. The needles, the special places to put the needles, the whole thing. I haven’t had a long discussion on this, but my impression is that Chinese people think of acupuncture as working in the same way that we understand that TV’s or cars or refrigerators work: even if we don’t know the details, we trust the basic idea.

Anyway, I don’t know what to make of this. The reports of scientific studies finding no effect of acupuncture needles are plausible to me (not that I’ve read any of these studies in detail)—but if they’re so plausible, how come none of my Chinese friends seem to be convinced?

This does seem to be a paradox, as evidenced by some of the discussion in the 56 comments on the above post.

Anyway, I was reminded of this when Paul Alper pointed me to this news article from Susan Perry, entitled “Real and fake acupuncture have similar effects on hot flashes, study finds”:

Women reported improvements in the number and intensity of their hot flashes whether they received the real or the fake treatment — a strong indication that the placebo effect was at work with both. . . . And before anybody jumps on this study for being conducted by conventional physicians who are antagonistic to nonconventional medical treatments, I [Perry] will point out that the lead author is Dr. Carolyn Ee, a family physician at the University of Melbourne who is trained in — and uses — Chinese medicine, including acupuncture, with her patients.

Here’s the study, which reports:

Results: 327 women were randomly assigned to acupuncture (n = 163) or sham acupuncture (n = 164). At the end of treatment, 16% of participants in the acupuncture group and 13% in the sham group were lost to follow-up. Mean HF scores at the end of treatment were 15.36 in the acupuncture group and 15.04 in the sham group (mean difference, 0.33 [95% CI, −1.87 to 2.52]; P = 0.77). No serious adverse events were reported.

53 thoughts on “Acupuncture paradox update

  1. The effectiveness seems to depend on the condition being treated: http://www.cochrane.org/search/site/acupuncture

    But in some cases, there seems to be evidence in favor of (small) effects of migraine, comparable in some cases to other prophylactic treatments. For example:
    http://www.cochrane.org/CD001218/SYMPT_acupuncture-preventing-migraine-attacks

    I’d be interested to hear if someone finds Cochrane unconvincing. Their methodology seems *very* solid to me.

  2. I like that Andrew is properly accounting for evidence from the beliefs of others – Aumann would approve. It makes me wonder, however, what the model is for taking the beliefs of others seriously. In this case, I’d think that, in the sense discussed here: http://lesswrong.com/lw/lx/argument_screens_off_authority/, evidence screens off the authority of others, unless they have un-shared evidence.

    In this case, the fact that others hold beliefs strongly would function as useful evidence prior to the studies cited. That prior, though, should be based on a generative model of their belief. (Not that modeling this is simple, but that’s the problem in general with priors that come from hard to model evidence.) Informally, the strong prior for efficacy should mean that, once evidence for no effect is found, more of the probability mass is shifted to the proposition that there are group beliefs in false claims, not towards conserving belief in the efficacy of acupuncture.

    • David:

      Interesting point; I hadn’t thought of it that way. Another thing to factor into the analysis is the interaction with nationality: my Chinese friends have much different attitudes about acupuncture than do my non-Chinese friends. And of course there’s systematic variation within national groups too. So it’s not just about there being some people who think acupuncture really works; there are clearly other factors going on here.

  3. My Chinese friend did a review of acupuncture literature a few years ago and presented the findings in a journal club. His mother performed acupuncture in China and he was quite convinced of its benefits. I was surprised when he dismissed a European study because it reported adverse events such as itching at the needle site or infection. He claimed that adverse events never occurred in Chinese acupuncture and therefore these reports showed that it was being done incorrectly in the West.

  4. It doesn’t help that western studies of acupuncture are full of the usual statistical bullshit that western studies of other stuff routinely get wrong… For example the study I referred to here:

    http://models.street-artists.org/2016/06/13/no-no-no-no/

    and critique of specific issues

    http://models.street-artists.org/2016/06/14/what-should-the-analysis-of-acupuncture-for-allergic-rhinitis-have-looked-like/

    Also, we need to test several things:

    1) Does getting poked by needles help?
    2) Does the location for the poke help?
    4) Does the mere sight of a needle (without significant needle penetration) help?
    5) Does just going to someone who cares about you, listens to you, and gives you helpful management suggestions help?
    6) Does the secondary stuff related to herbs etc that typically comes with a visit to a chinese medical practitioner help? If so, for a given individual, which herbs are actually the effective agents?
    7) Does your belief about whether you’re getting real vs sham treatment affect it?
    8) etc… there are probably at least several other things we should be testing

    Each of these is mixed together in typical “real vs sham” studies, and the effects of the different potential causes needs to be teased out. Since there are whole schools of “real” acupuncture, you might call that a single thing, but there is not any “standard” for “sham” acupuncture, and so from one study to the next it’s not like the “sham” is a consistent single thing.

  5. I dunno … how many of your Evangelical Christian statistician friends believe that prayer has real salutary effects?

    The selection effect of statistician against Evangelical Christian may be stronger than it is against Chinese, but if we were to compare like with like, then the analogy ought to hold up: Individuals raised in a culture that takes certain beliefs for granted are more likely to resist evidence against those beliefs even when trained to judge such evidence against a uniform standard.

    Here are some suggested moderating factors:
    -Motivated reasoning–particularly by concerns about health or bodily integrity–strengthens this resistance to counterevidence.
    -There may not be much of a market for liberalized (or metaphorical) variations on the core belief.

  6. I don’t find this paradoxical, it would be rather surprising if it wasn’t happening. NHST only measures prevailing opinion, and NHST is a very popular “tool” amongst medical researchers worldwide these days. So we should expect a division along cultural lines for topics like acupuncture. Is it the case that the research literature follows Andrew’s anecdotal experience? I don’t know, but it wasn’t difficult to find “pro-acupuncture” literature by author’s with Chinese names:

    http://www.ncbi.nlm.nih.gov/pubmed/27631690

  7. Susan Perry is referring to the following technical article:

    http://annals.org/article.aspx?articleid=2481811

    Unfortunately, the full article is behind a paywall but it does say that “Limitation: Participants were predominantly Caucasian and did not have breast cancer or surgical menopause.” Consequently, no matter what Bayesians may think of p-value and confidence intervals, a result of

    “95% CI, −1.87 to 2.52; P = 0.77”

    certainly strongly hints that Caucasian women suffering from hot flashes are wasting money on acupuncture.

    • >’Consequently, no matter what Bayesians may think of p-value and confidence intervals, a result of “95% CI, −1.87 to 2.52; P = 0.77”, certainly strongly hints that Caucasian women suffering from hot flashes are wasting money on acupuncture.’

      This thought process completely ignores how the data was collected, what was actually measured, etc. It is totally illegitimate.

      First, the therapists obviously could not be blinded in this study:
      “Participants, outcome assessors, and investigators were blinded to treatment allocation; the acupuncturists and the research assistant who randomly assigned participants were not. The self-reported outcome assessments were blinded.”

      Second, you are extrapolating well beyond the inclusion criteria:
      “Women were included if they were postmenopausal (>12 months since their final menstrual period) or in the late menopausal transition (follicular-stimulating hormone level ≥25 IU, amenorrhea ≥60 days, and VMSs), had a mean HF score of at least 14 (equal to 7 moderate VMSs daily) (16), or had kidney yin deficiency diagnosed using a structured Chinese medicine history as well as a tongue and pulse examination performed by experienced acupuncturists (Appendix Figures 1 and 2). Kidney yin deficiency, of which night sweats is a cardinal symptom, is a Chinese medicine clinical syndrome diagnosed in 76% to 81% of symptomatic postmenopausal women (17, 18). Women who had had a hysterectomy were included if they were older than 51 years with a follicular-stimulating hormone level of 25 IU or greater.

      Exclusion criteria were needle acupuncture in the preceding 2 years, age younger than 40 years, previous diagnosis of premature ovarian failure and age younger than 50 years, bilateral salpingo-oophorectomy, medical reasons for amenorrhea, poorly controlled thyroid disease, VMSs associated with breast cancer, current HRT use, vaginal estrogen therapy in the previous 12 weeks, treatment of VMSs for the previous 12 weeks, relative contraindications to acupuncture (anticoagulation, heart valve disease, or poorly controlled diabetes mellitus), and unwillingness or inability to adhere to trial requirements or to give informed consent.”

      Third, this HF score is a self-report:
      “The primary outcome was HF score at the end of treatment (EOT) (8 weeks) (16). Participants recorded the number of daily mild, moderate, severe, and very severe HFs for 7 days using a validated HF diary (16). We calculated the HF score using the following equation: ([1 × number of mild HFs] + [2 × number of moderate HFs] + [3 × number of severe HFs] + [4 × number of very severe HFs]) ÷ number of days reported. Thus, an HF score of 14, the minimum to enroll in our trial, may represent 14 mild, 7 moderate, 4.7 severe, or 3.5 very severe HFs per day, or a combination of these. Hot flash scores can include 0 but have no upper limit. Hot flash frequency represents the average number of daily HFs, and severity represents the average severity across all HFs, ranging from 1 (mild) to 4 (very severe).”

      The “validation” mentioned in ref 16 consists of arguments like this:
      “Patients who report toxicities that theoretically should be related to hot flash activity (such as abnormal sweating, trouble sleeping, and sleepiness) had significantly higher levels of hot flash activity than those who did not report such toxicities. For toxicities that were unrelated to hot flash activity (eg, appetite loss or nausea), the hot flash activity was the same between patients who reported such toxicities and those who did not. Correlation coefficient analysis produced the same results (data not shown).”

      Anyway, it is a self report, so we need to be concerned about whether the treatment is affecting the rate of reporting, or memory of the events rather than the events themselves.

      This is the important stuff, not the p-values (which are entirely irrelevant to a critical reading of this paper).

      • Some people do some stuff and find that the thing they’re measuring is noisy and that all the stuff they did didn’t make a measurable, practical difference. A prior for effect size also suggests that the effect, if real, is likely not very big. I guess the real question is “did they make a mistake which caused them to cancel out a real effect?” I’d only be likely to read carefully and critically if I had a prior that there was an effect of moderate size, and then I’d be looking for things like lack of blinding causing the practitioners to encourage the groups to report basically similar results…. but overall I think there’s just not much there.

        • It really does not take that much effort to read critically once you ignore all the spurious stuff in these papers.

          1) Were basic techniques like blinding/randomization used? If not, what is the excuse?
          2) What conditions do these results apply to? This is stuff like inclusion/exclusion criteria, diet, etc.
          3) What *exactly* got measured? How were the various factors that may influence these measurements monitored and controlled? How tenuous is the link between these measurements and what the researchers say they actually care about?

          It usually takes 10-30 minutes at most and (sadly) you can dismiss most papers in 5 minutes or less due to lack of information.

          Regarding the possibility of mistakenly cancelling out a real effect, I always thought the pioneer anomaly may be an interesting case study. In that case there was motivation to get rid of the deviation from the model (this is not NHST here), so they kept adding in various sources of error until the deviation was accounted for: https://arxiv.org/abs/1204.2507

          I’m not saying that is wrong, more just interesting to see what can be done when the motivations are reverse of what we usually see on this blog.

      • Anoneuoid:
        If I read correctly your comment, you are saying that there are far deeper issues with this study such as self reporting, lack of complete double blinding, improper validation, “totally illegitimate” data collection. Because I know nothing about “kidney yin deficiency” in particular and Chinese medicine in general, I looked at the unimpressive (western medicine) outcome measures such as p-value and confidence interval. As Perry points out, “the lead author is Dr. Carolyn Ee, a family physician at the University of Melbourne who is trained in — and uses — Chinese medicine, including acupuncture, with her patients” and thus one would have suspected some sort of meaningful difference between the sham procedure and the treatment. But nothing seems to have happened and consequently my non critical reading of the paper.

        • I’m just saying that if someone looks at a (NHST with nil null hypothesis) p-value and think this is sufficient info to draw any conclusion or inference (or even hint/suggest a conclusion) about the research hypothesis, there is something wrong with their thought process. There are always far deeper issues.

          In fact, these p-values contain no info whatsoever not already found in the effect + sample sizes, and that is already extremely sparse (pitifully insufficient) info. It is really rare for there to be any place for those p-values at all (sometimes they can be validly used to index a likelihood function), I have been skimming over them for years. Someone should offer a service/software that strips them from these documents, provides basic stats (eg number of mentions) on their usage, and then re-renders a pdf with reasonable layout. Providing such a clutter removal + competency heuristic like that could be worth quite a bit to people like me actually…

  8. There is a strong analogy to the hot hand “fallacy”. Anyone highly skilled who has played sports like basketball or games like pocket billiards where you take a lot of shots knows that you sometimes feel hot and then make the next shot, other times feel cold, and miss the next shot, and sometimes while hot, suddenly realize you feel different and “know” there is no chance you make the next shot. It’s about expectation regarding success of the next shot more than the sequence that led up to it. That’s tough to measure.
    This may explain why there was such strong resistance to the theory in the sports community and maybe why Chinese who have experienced acupuncture dismiss the studies by Western analysts.

      • Sorry, I was too subtle – see my quotes around “fallacy”. My example was intended to capture how the strong feeling of practitioners (i.e, basketball players and coaches) was given too little weight because it was based on a “feeling” not supported by the statistical results of the researchers and the mistake was not discovered for ~25 years. Not an argument to follow our hearts more than our minds, but to triple-check and stress test results that fly in the face of what the practitioners take for granted based on their inferences from many trials. So we do not want to dismiss the experience of the acupuncturists without thorough analysis.

        • Ah, my mistake. If I recall correctly, though, the estimated streakiness of the hot hand effect is way way smaller than people “feel” it is — the serial correlation is detectable but the outcomes are not as deterministic as all that.

        • Corey:

          That’s what I used to say, too, but it’s tricky. The issue is that serial correlation is an extremely attenuated way of measuring hot hand.

          Suppose, for example, someone is “hot” with a 60% chance of making a shot, or “cold” with a 40% chance, with long periods of hot alternating with long periods of cold. Then the difference between success probabilities, comparing hot and cold periods, is 20%. But the difference in probability of success, comparing just after a success and just after a failure, is (.6^2 + .4^2) – (2*.6*.4) = .04, only 4%. So in this simple scenario, there is a big hot-hand effect but it barely shows up if you measure streakiness by first-order autocorrelations.

        • Hmmm… now I’m imagining fitting a Bayesian model using a timeseries for the frequency parameter in a binomial distribution with the parameter having piecewise constant values and using the durations and levels as Bayesian parameters with distributions over them…. and seeing how well you can recover streakyness in Stan…

        • Corey: after trying to fit a GP to the voting time-series from a month or so back and killing Stan after like 10 minutes before getting a *single* warmup sample, I went immediately to some computationally simpler model in my thought process here.

          The non-discrete computationally attractive alternative would be something like a polynomial basis expansion and the kind of prior I discussed here: http://models.street-artists.org/2016/08/23/on-incorporating-assertions-in-bayesian-models/

          But although I filed a bug report on Chebyshev polynomials, and someone actually figured out how to get them integrated, no one ever pulled the changes and it never got incorporated into Stan. The orthogonal polynomials are much better for this sort of thing, as the coefficients would probably have much less correlation.

          Andrew, if you have a dataset with a series of hot-hand shots and can convince someone to pull chebyshev polynomials into Stan, I’d be willing to do the hot-hand example as a combined case-study and test-suite for the chebyshev stuff.

    • Brad Stiritz:
      Thanks for the interesting link which is about Ted Kaptchuk.

      “Because I am a damn good healer,” he [Kaptchuk] said. “That is the difficult truth. If you needed help and you came to me, you would get better. Thousands of people have. Because, in the end, it isn’t really about the needles. It’s about the man.”

      • Here is some more from the New Yorker article which Brad Stiritz suggested:

        A placebo effect is commonly observed during trials of blood-pressure medications. To qualify for such studies, subjects are supposed to have blood pressure that exceeds a hundred and forty over ninety in at least one of the two measurements. “As soon as somebody enters those studies, his or her blood pressure falls an average of five or six millimetres of mercury,” [Robert] Temple [of the F.D.A.] said. “That is significant, but it is not a placebo response, and it is not a response to being in the study. It is often the result of doctors’ inflating readings—of rounding up.” If a person’s blood pressure is a hundred and thirty-eight over eighty-eight, for example, investigators will often include him. “When you use an automatic blood-pressure cuff to establish a baseline for these kinds of studies, the entire placebo effect vanishes,” Temple said.

        • That’s a cool little fact, human desire to round-up into inclusion makes sense. But I imagine there could be a non-placebo effect in high blood pressure trials as well. You have marginal high blood pressure, why is this? At least one issue is psychological stress, and being worried about your blood pressure is itself a psychological stress. Enrolling in a trial could relieve that component of psychological stress and cause a moderate decrease. I doubt most trials even try to establish a baseline or track the effect of psychological stress but I’d be happy to be proven wrong.

  9. Isn’t this really in part a discussion about the placebo effect and what it means to term something a placebo? Placebo is not ‘sugar pill’. Similar stuff to Brad’s comment above.

    Part of the confusion when evaluating clinical impact often is around what the ‘placebo effect’ entails. It is not the same as saying ‘imagined effect’ or ‘not real’ effect. This is not just a semantic distinction. There appear to be measurable models of brain functioning that indicate belief/expectation increases the impact of ‘real’ analgesia on pain perception. When analgesia is removed there is, under some conditions and for some types of pain, a residual analgesic impact on pain perception (as clinically we always rely on report and observation in the absence of a definitive measure of pain). Prof Serge Marchand has done some of the most interesting research I’ve read and heard into the neuroscience in the impact of placebo on pain perception. For examples, http://www.ncbi.nlm.nih.gov/pubmed/14527708 and http://www.futuremedicine.com/doi/abs/10.2217/pmt.13.29. For acupuncture, which is funded sparingly in NZ, the available reviews are ‘accepted’ by our main provider of healthcare for injury as viewing it as providing relief, but not treatment.

    So, maybe acupuncture does calm and assist the patient, but maybe also it is ‘real’, if by that you mean changing brain functioning and improving analgesic impact. How? Well that’s above my pay grade.

    • Llewelyn:

      Here’s what I wrote in my above post: “But whenever I discuss the topic with any Chinese friend, they assure me that acupuncture is real. Real real. Not “yeah, it works by calming people” real or “patients respond to a doctor who actually cares about them” real. Real real. The needles, the special places to put the needles, the whole thing.”

      So, sure, the placebo effect is real, but presumably it doesn’t depend on where you put the needles. That’s what I meant by “real real”: not just that acupuncture helps people but that the specifics are what makes it work. My Chinese friends express the view that the specifics matter.

      • That is interesting — the whole body map thing and all as being determinative of treatment outcome. Is it Reiki that also makes similar claims? So, maybe the study to propose would be patients assigned to do both treatments, but in different orders. One condition in an fMRI (I want one!) who receive a set number of needles and a condition with someone, possibly me on a bad day, sticking them where I want, possibly even randomly. Compare the ratings of pain relief (probably increase in the condition I treat) plus changes in scans etc etc. Ethics approval might have a couple of holdups but the idea is growing on me. Know anyone who can do the stats?

        • I’d bet there are some places you really shouldn’t stick a needle in someone. But I also bet that, for many areas, any ol’ poking will do. That said, it would have to be any ol’ poking done by someone who a) had the exact same bedside manner, tone, inflections, postures… you know, a real practitioner; b) that practitioner would somehow have to not betray that he was sticking you in places he didn’t believe would help you (assuming most practitioners believe that placement of the needles is very important).

          Although maybe that is a clue on how to do it. I think there are two kinds of important and interesting variation you could generate experimentally to test this. Together I think they would identify the effect pretty well:

          1) Variation using Real and Fake Acupuncture Locations: You get a bunch of acupuncturists trainees, and have them trained on a new “technique” for headaches. One group gets the real technique, one group gets a fake technique. Then you compare outcomes between groups. Problem: the “fake” technique might help something else, which then to the patient helps them feel better overall.

          2) Variation using Better and Worse Placement of Real Acupuncture Locations: some of those students are going to have better needle placement than others. So you can check, within each group, if “technical quality to hit the exact points described by the teacher” correlates with the outcome. If the placement matters a lot, the more accurate trainees should get better results in the “real” placement arm, but not in the “fake” placement arm. Or, if technical ability is correlated with bedside manner/placebo-effect, then the SLOPE of improvement across technical ability should be higher in the “real” treatment group than in the fake treatment group.

          Those are between practitioner designs, but you could go (even more) expensive and do within-practitioner: Do the first experiment as above (maybe using real practitioners), but using repeated “runs” for each person. The patient has a headache. Sometimes you tell the practitioners the patient has a headache. Sometimes you tell them he has a stomach ache. Now you have a clean within-practitioner experiment. And if you worry that the practitioner knows the person doesn’t have a stomach ache or somehow intuits it, then, add in a measure of technical ability/accuracy as above, and see whether the practitioner-level treatment effect of going from “wrong” to “right” needle placements varies with technical ability. That would probably really nail it, if you want to directly test how important the body-mapping/placement bit is, while cleaning out a lot of other potential explanations for any effect you’d find.

        • There are quite a few studies where real practitioners are assigned to do sham acupuncture. I’m not clear on how “well” they can do that. It’s like asking olympic swimmers to swim in thick goo (Mythbusters did this and found that olympic swimmers couldn’t swim very well in goo, but everyday “normal” swimmers could swim easily in goo about as well as in water). Also I think most of the patients figure it out. You really want sham acupuncture in “incorrect” locations but placed with the same intensity and soforth.

          Generally the studies usually have many other methodological and analysis issues… so it’s very hard to figure out anything useful. If there’s an effect, it’s not enormous that’s for sure.

        • OCD kicked in a little… this one raises some interesting points about how acupuncture might work – aka as being real real. http://bmccomplementalternmed.biomedcentral.com/articles/10.1186/1472-6882-6-25

          Moffett (2006) says that: “No study proposed that the neurochemical effects of acupuncture depend on point selection. No study claimed to select points based on neurochemical effects. However, it should be noted that the locations of traditional points are well-established and often correspond to underlying nerves; thus, the selection of traditional points over “non-points” may be justified. Also, the local and segmental effects would logically depend on the needling sites. Certainly, no study proposed that the neurochemical effects depend on means (needles, pressure, electricity, laser, heat or ointment) or method (depth, style, frequency or intensity) of stimulation. While there is great emphasis on point selection and stimulation technique in traditional acupuncture, the neurochemical response to acupuncture may not depend on them.” And, there were studies using the near miss methodology of placement!!

          Still, I’m not clear how a needle stimulates endogenous opioid release, this being the dominant mechanism posited. No study measured opioid release levels, although some did use fMRI. And isn’t it interesting how many published studies of the same technique sway toward markedly different mechanisms of action and yet claim the same outcomes? So all or many of the reviewed studies performed statistical analyses I assume yet, without an underlying model, may only have added weight to confusion, rather than clarity. Evidence toward being Real real???

        • Llewelyn: I think that’s a really nice article. The authors point is ultimately that without actually testing a mechanism, testing “if there is or is not an effect” in such a noisy area of study basically gets you nowhere. If you hypothesize a specific mechanistic effect, and do a detailed study of whether that effect plays out, you can at least rule out that effect, and if the effect *is* real, then you can in fact discover something about it even in the presence of a lot of noise. Whereas with an anonymous “there might be an effect” you are hopelessly lost.

          Nevertheless, as seems common in medicine, this idea seems not to have caught on, and acupuncture trials shooting blindly in the dark continue.

          As for “how needle stimulates endogenous opiod release” I can imagine just simply a localized temporary pain-neuron stimulation could do this, endogenous opiods respond to pain, and people I’ve known who’ve done acupuncture say they experience definite localized “pain” like effects. Acupuncture practitioners often come along and “tweak” the needles (ie. flick them with a finger, or tap them with a mallet or heat them, or sometimes stimulate them electrically). I’ve been told that the process can provoke quite a *zing* reaction, a little like a mild electric shock (my wife, mother, and a few friends have done acupuncture, and a neighbor’s daughter is studying it at a school here in the LA area, but I’ve never done it myself).

          So, if “sham” acupuncture sometimes does and sometimes doesn’t stimulate similar pathways based on what it is that is considered appropriate for “sham” in each individual trial, then in fact you could see highly variable results from trial to trial. The needle placement could in fact be relatively unimportant, but the various manipulations and the depth of penetration, and soforth could be important… I’m not willing to say acupuncture has no effect, but I AM willing to say that modern methods of experimental design and statistical analysis in western clinical trials is totally inadequate to discover anything but fairly large consistent effects (such as perhaps the effectiveness of prednisone on chronic asthma exacerbations requiring hospitalization or the like)

          Anonuoid tends to expand on this theme, and for the most part I agree with him/her, the lack of mechanistic scientific thinking leaves medicine hugely handicapped.

      • I’ve tried acupuncture out and can totally assure you that the places where needles are put and the insertion technique can make a tremendous difference in the direct feeling. It goes from total discomfort, through brief pain or even lack of any perception to a kind of welcome muscle awakening. Also the state of muscles and nerves has a huge influence on this immediate feeling. And we’re not even talking of the mid-term and long term effects of it all, just the insertion.
        You can’t really have a RCT here and there are so many confounding factors at play -which studies mostly disregard- that I find it hopeless at the moment to draw any serious conclusion. Especially when there are so prejudicial and polarized views around, as also reflected in the comments here (which I find very sad, given the sophisticated audience one would expect), it’s mostly GOFPs and little signal. Maybe warm-blooded robots shall do it?

        Anyway here’s a metaanalysis showing efficacy in pain treatment:
        https://www.ncbi.nlm.nih.gov/pubmed/22965186
        So even w.r.t. current “knowledge” I wouldn’t be so sure that specifics don’t matter, where did such an assumption come from?

        It’s always a good exercise to be skeptical of *both* camps, although it’s never that easy in practice without continuous
        training, kinda of a chicken and egg problem. Even more difficult when inconclusive results are discouraged by current publishing practice.

        • Given my Biology background and the even stronger Biology background of close family and friends, I can see plenty of mechanisms by which acupuncture could work. The typical studies I’ve read simply do not test these mechanisms, they use the RCT machinery to test if there is an effect, and with the noise and lack of mechanism, they are hopelessly incapable of making definitive determinations.

          I personally believe that it is likely there is an effect, and that it does depend on needle placement and manipulation, and that the effect has absolutely nothing to do with the pseudoscientific way in which acupuncture is taught (ie. Chi stagnation along meridians or whatever). That being said, if there is a biological basis, and the whole rigamarole about chi stagnation is sufficiently correlated with that biological basis, then yes, the needle placement could be important, and the teachings could be helpful in choosing needle placement.

          So, until someone with a strong biological, biochemical, and physiological background comes along and posits a variety of models, and then goes out to test those specific models with measurements designed to distinguish between them, and studies the whole thing in a scientific way, we will not have any resolution. RCTs are as close to non-scientific as you cant get without actually crossing the line. That is, typically, their purpose isn’t to investigate and build a model of the mechanism. They’re a crude way to determine if there is a moderately consistent effect even when the mechanism is totally unknown, and the limited information they rely on automatically gives them “low power” to detect small effects, or effects that work only under certain circumstances which we’re ignorant of.

  10. I have no idea whether acupuncture actually works or not. But what has always interest me about it is if you go into a good acupuncturist, there is a detailed map of the body, showing not only exactly where to put the needles for different conditions, but also the organ(s) affected by doing so, as well as the “energy” pathway to these organs that are affected. It is an extremely precise, detailed and complicated system.

    The historic or sociological question becomes how did such a system develop. One such possibility is, that at least as done in China, acupuncture does work, and the system has been refined by a lot of trial and error over time, and that is a lot that we don’t know about how the body works that is captured in acupuncture. I don’t know if that is true, but that is certainly a reasonable explanation for how a complex system may have developed. One other possibility is that acupuncture is mostly hooey, but it would be equally fascinating to try and trace how something that complicated that doesn’t work actually developed. On the one hand we would possibly learn more about the human body, on the other hand we would learn a lot about how human knowledge and beliefs develop.

    • There’s no shortage of extremely complex, beautifully intricate, but empirically empty systems that people have developed. (I.e. your second mode.) See, e.g., the rituals of orthodox Hinduism or Judaism. Robert Sapolsky’s “The Trouble with Testosterone and Other Essays on the Biology of the Human Predicament” discusses this, and is one of my favorite popular science books.

    • With infinite data, would an adaptive computer algorithm evolve acupuncture?

      This to say that acupuncture could work even if the rationalization is as bogus as the discovery is serendipitous.

      PS don’t think it is “real real” .

  11. Procedures with wildly inconsistent results are exactly the sort of things that give rise to ritualism. Think of all the superstition in sports, or bad advice you can get from a western doctor. The instinct is “this worked the last three times, I must be doing something slightly wrong” and not “Wow, by this point my mental model has like 30 different parameters and I only play 16 games a year, I’m on pretty shaky ground.”

      • Compare acupuncture to one simple aspect of religion – prayer. Plenty of people believe that prayer has a “real real” effect. If some research shows that prayer does nothing – those people were simply not praying “right”, same as western acupuncture clearly doesn’t do it right.

        Any time you have an effect which can produce “working” anecdotes, once it takes a hold of a human brain, very hard work is required to convince them that anecdotes are not data.

        And I don’t see much difference between supernatural aspects of religion vs acupuncture (aside from one being known to not work for significantly longer period of time). Neither of the believers in those things concerns themselves much with *how* they could work, and for most it’s just a fact of life that they just do – just like the TV does. Slightly more aware ones will say something along the lines of “we just don’t understand yet how it works, but that’s our fault and doesn’t diminish the fact that it does”.

      • Yet, there’s a strong correlation between nationality and religion, and if somebody was to tell me that nationality was supernatural, I would have a hard time arguing with them; nations are about as real as ghosts. They only exist inside our heads, although of course they both have effects, because our heads happen to control hands.

        My model for this would be, it’s not that religious people are magical thinkers; it’s that humans are magical thinkers, and the ways that our magical thinking manifests depend, in part, on our cultures.

  12. Acupuncture is used for pain management in some non-human animals, and there are a couple organizations that have endorsed it as a possible treatment modality. Wikipedia’s article on veterinary acupuncture (https://en.wikipedia.org/wiki/Veterinary_acupuncture) has a nice overview of the practice and some relevant references.

    I think this actually a very important data point, because a lot of the possible explanations for why acupuncture really works in humans don’t clearly apply to cats, dogs or horses.

  13. Also, studies that show that real acupuncture works as well as sham acupuncture are not hard to find. In most cases, that finding is used to conclude, not that acupuncture doesn’t work beyond the placebo effect (which we would shorten to “it doesn’t work”), but that both modalities work. That’s the crazy world of alternative medicine for you.

    • The problem is there is no one thing called acupuncture, nor especially is there one thing called sham acupuncture. A good way to study this would be to find some area where acupuncture supposedly works, let’s say allergic rhinitis or arthritis pain management or something, and then posit a mechanism (let’s say it induces the release of several important hormones that then regulate the immune system or the pain sensing nervous system or something) and then interview patients who have had acupuncture and find those who claim a benefit, and interview the acupuncturists other patients and find out from their patients what fraction of the patients with this disorder find benefit, and then narrow down to several acupuncturists who have high success rates, and then standardize on the treatments used by those acupuncturists, and find people who have low success rates and standardize “sham” acupuncture based on things that the low-success people do (that is, it’s acupuncture, but it’s “bad” acupuncture), and then find people in the general population who have the complaints and have never tried acupuncture, and then randomize them to “bad” vs “effective” under these definitions, and then come up with measurements that get at the mechanism (blood levels of hormones, nerve conduction tests, blood levels of immune system cells, blablabla) and do before-after measurements of these on both groups, and track the reported success as well as the objective measures of important mechanistic results (the hormones and immune cells and nerve conductions etc etc) and try to find out if “real” vs “sham” produce different reported outcomes and if those reported outcomes are related to the objective outcomes, and by the way try to compare the outcomes with medical treatments like allergy pills or anti-inflammatory drugs or whatever.

      This kind of actual science just isn’t being done as far as I can tell. the literature full of little RCT type trials is a joke compared to this kind of mechanistic study where someone tries to find out not only *if* it works, but *how* it works, and to rule out different mechanisms by looking for real measurable differences in the biological causes.

Leave a Reply to Llewelyn Richards-Ward Cancel reply

Your email address will not be published. Required fields are marked *