Why I’m not participating in the Transparent Psi Project

I received the following email from psychology researcher Zoltan Kekecs:

I would like to ask you to participate in the establishment of the expert consensus design of a large scale fully transparent replication of Bem’s (2011) ‘Feeling the future’ Experiment 1. Our initiative is called the ‘Transparent Psi Project’. [https://osf.io/jk2zf/wiki/home/] Our aim is to develop a consensus design that is mutually acceptable for both psi proponent and mainstream researchers, containing clear criteria for credibility.

I replied:

Thanks for the invitation. I am not so interested in this project because I think that all the preregistration in the world won’t solve the problem of small effect sizes and poor measurements. It is my impression from Bem’s work and others that the field of psi is plagued by noisy measurements and poorly specified theories. Sure, preregistration etc. would stop many of the problems–in particular, there’s no way that Bem would’ve seen 9 out of 9 statistically significant p-values, or whatever that was. But I can’t in good conscience recommend the spending of effort in this way. I think any serious work in this area would have to go beyond the phenomenological approach and perform more direct measurements, as for example here: http://marginalrevolution.com/marginalrevolution/2014/11/telepathy-over-the-internet.html . I’ve not actually read the paper linked there so this may be a bad example but the point is that one could possibly study such things scientifically with a physical model of the process. To just keep taking Bem-style measurements, though, I think that’s hopeless: it’s the kangaroo problem (http://statmodeling.stat.columbia.edu/2015/04/21/feather-bathroom-scale-kangaroo/). Better to preregister than not, but better still not to waste time on this or similarly-hopeless problems (studying sex ratios in samples of size 3000, estimating correlations of monthly cycle on political attitudes using between-person comparisons, power pose, etc.). I recognize that some of these ideas, ESP included, had some legitimate a priori plausibility, but, at this point, a Bem-style experiment seems like a shot in the dark. And, of course, even with preregistration, there’s a 5% chance you’ll see something statistically significant just by chance, leading to further confusion. In summary, preregistration and consensus helps with the incentives, but all the incentives in the world are no substitute for good measurements. (See the discussion of “in many cases we are loath to recommend pre-registered replication” here: http://statmodeling.stat.columbia.edu/2017/02/11/measurement-error-replication-crisis/).

Kekecs wrote back:

Thank you for your feedback. We fully realize the problem posed by small effect size. However, this problem in itself can be solved simply by throwing a larger sample at it. In fact based on our simulations we plan to collect 14,000-60,000 data points (700 – 3,000 participants) using bayesian analysis and optional stopping, aiming to reach a Bayes factor threshold of 60 or 1/60. Our simulations show that using these parameters we only have a p = 0.0004 false positive chance, so it is highly unlikely that we would accidentally generate more confusion on the field just by conducting the replication. On the contrary, by doing our study, we will effectively more than double the amount of total data accumulated so far by Bem´s and others studies using this paradigm, which should help with clarity on the field by introducing good quality, credible data.

You might be right though that the measurements itself is faulty, and that we cannot expect precognition to work in an environmentally invalid situation like this. But in reality, we don’t have any information on how precognition should works if it really does exist, so I am not sure what would be a better way of measuring it than seeing how effective are people at predict future events.

Our main goal here is not really to see whether precognition exists or not. The ultimate aim of our efforts is to do a proof of concept study where we will see whether it is possible to come to a consensus on criterion of acceptability and credibility in a field this divided, and to come up with ways in which we can negate all possibilities of questionable research practice. This approach can then be transferred to other fields as well.

I then responded:

I still think it’s hopeless. The problem (which I’ll say using generic units as I’m not familiar with the ESP experiment) is: suppose you have a huge sample size and can detect an effect of 0.003 (on some scale) with standard error 0.001. Statistically significant, preregistered, the whole deal. Fine. But then you could very well see an effect of -0.002 with different people, in a different setting. And -0.003 somewhere else. And 0.001 somewhere else. Etc. You’re talking about effects that are indistinguishable given various sources of leakage in the experiment.

I support your general goal but I recommend you choose a more promising topic than ESP or power pose or various other topics that get talked about so much.

Kekecs replied:

We are already committed to follow through with this particular setting. But I agree with you that our approach can be easily transferred to the research of other effects and we fully intend to do that.

If you put it that way, your question is all about construct validity. Whether we can detect the effect that we really want to detect, or are there other confounds that bias the measurement. In this particular experimental setting which is simple as stone (basically people are guessing about the outcomes of future coin flips) the types of bias that we can expect are more related to questionable research practices (QRPs) than anything else. The only way other types of bias, such as personal differences in ability (sampling bias), participant expectancy, and demand characteristics, etc., can have an effect is if there is truly an anomalous effect. For example if we detected an effect of 0.003 with 0.001 SE only because we accidentally sampled people with high psi abilities, our conclusion that there is a psi effect would still be true (although our effect size estimate would be slightly off).

That is why in this project we are focusing mainly on negating all possibilities of QRPs and full transparency. I am not sure what other types of leakage can we have in this particular experiment if we addressed all possible QRPs. Would you care to elaborate?

I responded:

Just in answer to that last question: I’m not sure what other types of leakage might exist—it’s my impression that Bem’s experiments had various problems, so I guess it depends how exact a replication you’re talking about. My real point, though, is if we think ESP exists at all, then an effect that’s +0.003 on Monday and -0.002 on Tuesday and +0.001 on Wednesday probably isn’t so interesting. This becomes clearer if we move the domain away from possible null phenomena such as ESP or homeopathy, to things like social priming, which presumably has some effect, but which varies so much by person and by context to be generally unpredictable and indistinguishable from noise. I don’t think ESP is such a good model for psychology research because it’s one of the few things people study that really could be zero.

And then Kekecs closed out the discussion:

In response, I find doing this effort on the field of ESP interesting exactly because the effect could potentially be zero. Positive findings have an overwhelming dominance in both psi literature, and social sciences literature in general. In the case of most other social science research, it is a theoretical possibility (but unrealistic) that researchers just get lucky all the time and they always ask the right questions, that is why they are so effective in finding positive effects. Again, this is obviously cannot be true for the entirety of the literature, but for each topic studied individually, it can be quite probable that there is an effect if ever so small, which blurs the picture about publication bias and other types of bias in the literature. However, it may be that there is no ESP effect at all. In that case, we would have a field where the effect of bias in research can be studied in its purest form.

From another perspective, precognition in particular is a perfect research topic exactly because these designs by their nature are very well protected from the usual threats to internal validity, at least in the positive direction. It is hard to see what could make a person perform better at predicting the outcome of a state of the art random number generator if there is no psi effect. Bias can always be introduced by different questionable research practices (QRPs), but if we are able to design a study completely immune the QRPs, there is no real possibility for bias toward type I error. Of course, if the effect really exists, all the usual threats to validity can have an influence (for example, it is possible that people can get “psi fatigue” if they perform a lot of trials, or that events and contextual features, or even expectancy can have an effect on performance), but we cannot make a type I error in that case, because the effect exists, we can only make errors in estimating the size of the effect, or a type II error.

So understanding what is underlying the dominance of positive effects in ESP research is very important. If there is no effect, psi literature can serve as a case study for bias in its purest form, which can help us understand it in other research fields. On the other hand, if we find an effect when all QRPs are controlled for, we may need to really rethink our current paradigm.

I continue to think that the study of ESP is irrelevant for psychology, both for substantive reasons—there is no serious underlying theory or clear evidence for ESP, it’s all just hope and intuition—and for methodological reasons, in that zero is a real possibility. In contrast, even silly topics such as power pose and embodied cognition seem to me to have some relevance to psychology and also involve the real challenge that there are no zeroes. Standing in an unusual position for two minutes will have some effect on your thinking and behavior; the debate is what are the consistent effects, if any. That’s my take, anyway; but I wanted to share Kekecs’s view too, given all the effort he’s putting into this project.

61 thoughts on “Why I’m not participating in the Transparent Psi Project

  1. The lack of theory in this case makes me apprehensive. Any null or non-null finding doesn’t mean much (or anything) if the hypotheses tested weren’t generated from overarching theory (imo).

    • Actually this is one of the cases where NHST is ok (or at least not a total waste of time). Many people really do believe the effect is exactly zero and that any real deviation would be interesting. It is just a matter of running a good enough experiment.

      • I assume Nick is referring to some underlying mechanistic (i.e. *scientific*) theory, not the “theory” that the effect is/is different from 0. A mechanistic theory would generate quantitative predictions and would give us insight into *why* this effect may exist, whereas this whole “the effect is different from 0” theorizing is not scientific but instead statistical. In short, the information gained from such a study does not inform our understanding of psychology at all.

    • Also, I’m partial to the star trek idea that if psi abilities exist it makes sense the effect is very small, since the “power” is also a vulnerability (other beings could more easily take over your body). IE, perhaps in the past there was some huge war fought with psi-weapons that wiped out all the species with meaningful abilities, and today we are all descended from those who were “resistant”.

    • I’d say the abundance of empirically established theory* being defied in this case (and others like it) is the greater cause for concern. It’s that which justifies an enormously strong sceptical prior in favour of the null – one strong enough to render the experiment(s) futile cargo cult science.**

      * http://www.preposterousuniverse.com/blog/2008/02/18/telekinesis-and-quantum-field-theory/
      ** http://www-biba.inrialpes.fr/Jaynes/cc05e.pdf

      • I really can’t believe I am in the position of defending ESP but really what is the problem? Most likely there is insane amounts of information being invisibly transmitted through your body right now (wifi, bluetooth, tv, radio, etc).

        Why is it so ridiculous that organisms could have developed similar abilities? The answer: It isn’t a crackpot idea. Maybe the people running experiments about this topic suck at their jobs, but the basic idea is fine and the typical person trying to eg cure cancer sucks just as much. Also, I looked at your first link and (as is noted by the title) it is about telekinesis?

        • To me the problem is the overconfidence that extremely large experiments can be implemented and analysed with no leakage at all.

          One can hope to design them with no leakage – but if there is just a bit – you or someone will be mislead.

          The underlying problem might be a false sense of privilege that we can get answers in areas so vaguely understood by brute force – by putting (super)nature(al) on the rack.

          Now once there is some understanding and theory which support a palpable effect – then no problem.

          Also my link below did defend ESP.

        • The problem is that Bem’s ‘Feeling the future’ Experiment 1 purports to be a test of the extremely crackpot idea that is retrocausal ESP. And even causal ESP – telepathy – is actually a quite crackpot idea, as you will see if you do read the rest of that first link (which isn’t just about telekinesis). But you’re right that it’s not as problematic as retrocausal ESP or telekinesis…

          “A great surprise of the early work was that PK affected only rolling dice, but could not be measured as a force acting on a stationary die on a sensitive scale. PK seemed to act only where chance processes were involved. This suggested that PK could not be considered as a force, comparable to electric or magnetic forces.” http://www.fourmilab.ch/rpkp/strange.html

          /o\

        • and even causal ESP – telepathy – is actually a quite crackpot idea, as you will see if you do read the rest of that first link (which isn’t just about telekinesis).

          I will check the link, but this seems impossible to me. Right now I could become “telepathic” by getting some kind of implant that receives wifi and converts to a sensation (eg sound). Telepathy is purely an engineering problem.

        • What “seems impossible”? The fact that telepathy isn’t impossible from the perspective of physics alone doesn’t make the implicit claim that nature has already engineered a microwave transmitter/receiver for us non-crackpot.

        • As Sean Carroll explains in the first link in the post you responded to, physicists have proven (to a ridiculously high probability) that ESP cannot exist. It is physically impossible. Therefore, no matter how much its development might be helpful to a species, the physical laws that govern the universe do not allow for it.

        • physicists have proven (to a ridiculously high probability) that ESP cannot exist. It is physically impossible.

          I read it. Meanwhile I’ll be using wifi, cell phone, etc to transfer information undetectable to my/your senses. This argument is ridiculous in the modern age because there clearly is extra-sensory information all around you right now. As mentioned above, it is possible today to get some kind of implant that will pick up extrasensory signals and use that to affect your experience.

          Extrasensory perception, ESP or Esper, also called sixth sense or second sight, includes reception of information not gained through the recognized physical senses but sensed with the mind

          https://en.wikipedia.org/wiki/Extrasensory_perception

          You can also imagine two robots communicating directly via wifi, they would appear to be “psychic” by every definition.

        • You’ve done an amazing job of misinterpreting the meaning of ESP and that article. Do you seriously believe that parapsychologists are interested in whether people can hear signals using man-made devices [that’s what we call a rhetorical question]? Evidently, you do not, since even you had to put “telepathy” in scare quotes in order to broaden the meaning to the point of meaninglessness.

        • Imagine people can read other people’s minds because there’s a specific electrical signal that’s emitted by their neural functions that some people are sensitive to. This is basically what Anoneuoid means, not that some people have special man-made implants, but that there’s a natural mechanism to pick up information from remote sources which we aren’t aware of. Unlike retro-causal future prediction or tele-kenesis, this isn’t ruled out by the known laws of physics.

        • It would depend very heavily on the precise definitions involved, I think that’s Anonuoid’s point.

          For example, if these people can detect pigments in skin that change color subtly when people lie or are aroused or whatever http://discovermagazine.com/2012/jul-aug/06-humans-with-super-human-vision

          and you define your tests to include that sort of thing… then it wouldn’t take much to get me to take the bet. But if your bet is that two people in separate solid steel Faraday cages can synchronize which letters they’re thinking of reliably… pretty much forget it.

        • I meant that “people can read other people’s minds because there’s a specific electrical signal that’s emitted by their neural functions that some people are sensitive to.”

          I’d say that 10^-9 would be a very generous upper bound on the probability that that is true.

        • Imagine people can read other people’s minds because there’s a specific electrical signal that’s emitted by their neural functions that some people are sensitive to. This is basically what Anoneuoid means, not that some people have special man-made implants, but that there’s a natural mechanism to pick up information from remote sources which we aren’t aware of. Unlike retro-causal future prediction or tele-kenesis, this isn’t ruled out by the known laws of physics.

          Thanks Daniel. I would add that allowing others to “read your mind” should be selected against (since it would impede the ability to deceive) so we should expect such signals to be highly encrypted/compressed. Insofar as this is correct the signals should look a lot like noise. I am totally unfamiliar with how they rule out EM signals passing between people, but surely there is some kind of “background noise” always present that may be including this information.

          One experiment I’d like to see from proponents is whether this purported ability drops off by an inverse square law, which would suggest EM signals. My impression is that the methods used to check for this phenomenon are extremely crude, but perhaps such a study exists?

          Also, anyone who has had this done to them knows that other people can indeed control your body via invisible EM signals:
          https://www.youtube.com/watch?v=6VrkkBCbwvM

          It is really creepy.

      • Does anyone know what fake news Jaynes is referring to here?

        The information we get from the TV evening news is not that a certain event actually happened in a certain way; it is that some news reporter has claimed that it did. Even seeing the event on our screens can no longer convince us, after recent revelations that all major U.S. networks had faked some videotapes of alleged news events.

        http://www-biba.inrialpes.fr/Jaynes/cc05e.pdf

        • I can’t pull anything up on Google.
          …consider the irony if it’s actually false but Jaynes believed it because someone he trusted claimed that it was true…

        • He may have had in mind the Dateline NBC report that had faked video of a pickup truck exploding, which was in ’92 or ’93.

  2. I find this profoundly sad. There are so many issues in the world that deserve our attention and study. Psychology has the potential to actually do research that affects real people’s lives for the better. Throwing resources at researching a phenomenon we all know does not exist just because it was published in JPSP feels like a waste of everyone–including participants’–time. The most depressing thing about the “replication crisis” is that someone publishes a catchy, cute priming or embodied social cognition effect, which has little theoretical or real-world meaning, and then we have to spend valuable resources performing replications to see if there is actually something there. But replications are hard and measurements are noisy, so it ends up being an absurd amount of resources thrown st something that doesn’t really matter, in my opinion.

    The whole state of psychology is one of the reasons that I’m leaving academia after getting my PhD, despite having a solid publication record. A guy who openly advocated for p-fishing argues in JPSP that psi is real. Now, psychologists are throwing resources at trying to replicate it. I feel like it is shirking our responsibility to society.

    I also agree with everything you said here, but I felt like ranting after seeing that someone wanted to throw 700 – 3,000 resources at psi instead of, you know, a phenomenon that could help people.

    • Mark:

      I agree—but I have to admit I’d be on stronger ground here if I didn’t spend so much time on sports analytics. Given that, it’s hard for me to make a hard argument against people wasting their time studying ESP—but I do think it’s a waste of time. Say what you want about the hot hand—it’s not hard to make a good argument that it too is a waste of time—but at least it’s real.

      • Yeah, I don’t mean to come off too dogmatic or like we can’t have fun—I myself do a good amount of tinkering around with analyzing sports data, mining lyrics, looking at Rotten Tomatoes and Metacritic scores, etc. But those resources are my free time, are me just webscraping while listening to podcasts.

        I think why I had such a negative reaction to it was the “14,000-60,000 data points (700 – 3,000 participants)” part. That is a whole lot of participant time, brainpower, etc., for investigating something Wikipedia refers to as a “pseudoscience.” Like you said, even if one is looking at something (relatively) trivial, like sports, at least sports are real.

      • “Say what you want about the hot hand—it’s not hard to make a good argument that it too is a waste of time—but at least it’s real.”

        GS: A wise man once said that science advances by finding similarities – differences are a dime-a-dozen. What would be the larger context of similar phenomena WRT the “hot hand”? What sort of phenomenon is it? The answer to that question may go to any scientific importance researching “it” (And what is “it” again”) may have.

    • I wholeheartedly agree with you but what you are missing is that in the field of esoteria people don’t find these sort of things unnecessary – even if the effect size would me minuscule – since they see this as a revelation of a greater truth than can potentially save the human race from the nuclear war or whatever it is that they want us to be saved from. It is not about the insignificant psi effect, it is about the POSSIBILITY of it, which – to a large population of these guys and gals – proves it and opens the doors for whatever they are trying to prove. When we just concentrate our psi powers we can make the UFOs dance…

  3. There seems to be an obvious way to do this based upon the fact that psi supposedly varies among individuals, but I can’t tell if it is included in the design. Since the effect is small across a large population, the population should be successively winnowed based upon psi performance. In Round 1, everyone guesses. Those that seem to guess better than chance go on to Round 2. This continues until there is a predetermined population size of either “high psi” individuals, or lucky individuals. Then Phase 2 begins, first comparing the performance of the high psi folks to randomly selected groups in predicting coin flips, then other types of predictions, with enough discrimination that the lucky folks run out of luck. Phase 2 must produce a significantly higher effect size or psi is refuted.

    That being said, I agree with Andrew that a different phenomenon that varies across a population would be better because psi is inherently silly.

    • The “winnowing” process reminds me (vaguely) of Larry Niven’s character Teela Brown in the Ringworld series, who was “selected” for luck (https://en.wikipedia.org/wiki/Teela_Brown). This did have some merit as an interesting literary device. But I agree with Andrew that declining to participate in designing psi experiments is a rational choice — lots better things to do with one’s time (including writing scifi, if one is so inclined).

  4. I can’t believe this trash pseudoscience is being entertained at all. Giving it even a moment’s time is dangerous and further promotes its legitimacy in not only the eyes of the charlatans that propose it, but even worse legitimises crackpot pseudoscience to the general non-scientific population. This is a Bayesian blog, so in that theme I wouldn’t accept anything wider than practically a Dirac delta prior on 0 effect given all our knowledge of biology, physiology, and physics.

    • legitimises crackpot pseudoscience to the general non-scientific population

      Have you read the comments about NHST on this blog? Any effect of this psi project is undetectable compared to that… The NHST problem is what people should be worked up about.

      • It’s easy to sit here and say that NHST is the worst thing, and that we should instead be writing down models with real predictions and then testing them…but the fact of the matter is that the theory is nowhere near good enough in the social sciences to proceed in this manner. It’s not like physics, there is no “general theory” in economics, or social psychology. How do you propose social scientists proceed? I agree that the null of no effect is stupid in a lot of contexts, but what is the alternative? The idea of having a process like: “write down a model.. it has “real” and “relevant” testable implications (i.e. X has exactly “b” effect on Y), you go test them”. This is just not feasible in, say, economics. Every model would just be rejected..and I don’t think we can really expect this to change – the structural parameters that govern human behavior are changing over time and space, so a model that seems suitable for the macroeconomy in the 2000s may not be suitable in the 2020s. So what is the alternative to NHST? I think that just not talking about hypothesis tests altogether is a good way to go, and indeed I think a lot of economics papers are written this way – you have some patterns in the data, write a model, estimate some parameters, tell a story. You are not “testing” anything, given that the model is basically written to formalize the patterns you found in the data. And perhaps the follow up papers to the one just described (which could be thought of as more exploratory) could go and test the general predictions of the theory proposed in the original paper in a different context. I’m just not sure, in the social science context, that there are hypotheses to be tested that aren’t just a null of no effect (or a one-sided effect).

  5. They plan on using Bayes Factors?

    Doesn’t the typical Bayes factor typically use one “hypothesis” that is super peaked at 0 and another “hypothesis” that permits effects to exist?

    If so, we can already say a-priori that the effect MUST be incredibly small, as a population parameter anyway. Wouldn’t the Bayes Factor “null hypothesis” always be accepted in this case? If you have a really peaked null prior (N(0,.002)) vs a non-null prior (N(0,.5)), the Null would most likely win, even IF there is an incredibly small effect.

    ESP is indeed one of those rare cases where I’d say NHST actually does make sense, if only because it uses a point-zero parameter as the null hypothesis. I’m not sure whether their BF analysis will really have a remote chance of ‘rejecting’ the null, so to speak, because the typical BF null prior would actually be what I would hypothesize the population ESP effect to be, should it exist. If ESP exists, and very few people have the ability and/or the effect is incredibly small, then the population parameter MUST be incredibly small, which the null BF hypothesis would actually permit; therefore, if ESP exists or doesn’t exist, I imagine the BF would actually nearly always support the ‘null’.

    Not that I endorse the use of BF anyway, to be frank; BFs drew me into Bayesian analysis, but I stayed for the posterior distributions.

  6. “…even silly topics such as power pose and embodied cognition…”

    GS: Someone should ask Gelman how one of the first sensible views of cognition (whatever the eff that is) to arise *from within* mainstream psychology (they do, after all, claim to be interested in “cognition” – whatever the eff that is) is “silly.” Embodied cognition (EC) is often thought of as starting with Gibson but one of the first great experiments was in 1963 (Held & Hein). Briefly, pairs of kittens [Oh I guess you’d have to agree that some non-humans are capable of “cognition” – whatever the eff that is.] were reared in total darkness except during frequent experimental sessions. During those sessions, each pair of kittens was placed in an apparatus that consisted of two “gondolas” affixed to uprights that were connected to a central t-shaped piece in a large round chamber painted with white and black vertical stripes. The kittens went into the “gondolas” but one of the pair’s legs protruded from the bottom of it. That kitten (the “active kitten”) could walk around the chamber both clockwise and counter-clockwise and its motion, of course, was transmitted to the other gondola. The other kitten (the “passive kitten”) was moved through space in exactly the same way as the “active kitten.” The nature of much of the visual stimulation for the two kittens had to be similar to each other’s but this stimulation was a function of the behavior only of the “active kitten.” The results were utterly dramatic – in the “visual cliff” test, for example, nearly all (or all) of the passive kittens walked “immediately” onto the cliff whereas the latencies of this behavior were high (maybe some never would walk out on it – been a long time since I read it) for the active kittens. And there were other vision tests. The bottom line: only the kittens for whom visual stimulation in the experimental session was a function of its behavior could see worth a damn. That is, vision (and by extension perception in general) requires a body (as in “embodied”) that moves around the world. Simply exposing an animal to the same (well, similar) stimulation does not result in functional vision. Wow! That IS silly!

    • Not even any of Gelman’s Minion’s will answer the above question? Hmm, just like some of the hapless psychologists that Gelman routinely attacks (not that they don’t deserve it mind you) probably never answer the criticism…

  7. To me, the futility of the experiment is that the prior probability of the null hypothesis is so high that, if the study has a positive outcome, the probability that the outcome is false will be overwhelmingly greater than the probability that it is true. To put it plainly, the truth is known in advance. Thus the study is a test of whether an ESP experiment can be run with negligible leakage or QRPs. Given the history of parapsychology this is an open question. Whether it’s a question worth committing resources to answer is less clear.

  8. The problem of just getting a bigger sample size is that the measurements will probably be trash. Companies, who actually have financial interest in the results, have a hard time with quality control… how can any one think a bunch of researchers could possibly deal with that?

  9. ESP is the kind of thing that a sample of SIZE ONE should be enough to demonstrate anything interesting. It’s completely nonsense to test something in 60K people if you can’t even demonstrate for one.

    • This is my thinking too. If someone doubted that memory is real, you could demonstrate its existence easily with a random person on the street. Show them a card and put it at the bottom of the deck where they can’t see it. Now ask them what it was. A few people will sometimes fail in the excitement, but the demonstration will be unambiguous within a few cards. Okay, now ask someone what card you are going to show them. Suppose someone lucks out and guesses correctly (the odds are 1 in 52, after all). As in the other case, it will not take long before they demonstrate they don’t have the ability you’re testing for, namely, prescience.

      • Exactly, if a researcher can’t demonstrate an effect with one person, how to justify reaching out thousands? You usually go to the thousands because you find an effect in a group and you want to see if this generalizes to other groups. It’s not the other way around. If you can’t find ESP in a single person/group, getting a huge sample size is meaningless.

  10. Studying ESP scientifically is worse than hopeless, it’s impossible. The fundamental problem with ESP is that it has a self-defeating definition. As soon as a mechanism is discovered or identified that can explain any capability previously ascribed to ESP, the capability is *by definition* no longer ESP. Any experiment showing an effect that can be studied in any compositional way is a failure as ESP qua ESP.

    Pheromones are a fairly useful example. Until they were discovered, path finding by insects and selection of receptive mates had a large mysterious component that was in romantic music and literature attributed to “witchcraft” and other ESP-like phenomena. Now, we don’t know the detailed mechanisms by which oxytocin levels are transformed into attachment behaviors and mental states, but there’s a clear scientific program to figure it out, which may take many decades, without anyone being inclined to stick ESP somewhere into the processes just because psychiatrists and philosophers have zero clues about how neurochemical flows become thoughts and feelings.

    When I was a first-year grad student, I wanted to study an area where the scientific background was poorly developed. All the experiments that I came up with were simple searches for main effects. My adviser nixed these studies, pointing out that to make a significant, publishable impact, a study should address competing theories that make different predictions. A good study would provide evidence for one theory vs another, which would show up in the interaction terms of the data analysis. This was good advice, but hard to follow in a prescientific field.

    Certain experiments in quantum mechanics show effects that appear to be retrocausal (look up “delayed choice” and “quantum eraser”), but arguing that X is mysterious, and Y is mysterious, therefore X=Y, is a non-starter. A credible theory linking those situations to the scales of human behavior will need so show an evolutionary adaptive advantage and cellular function similar to that shown for quantum entanglement in photoreception and photosynthesis. Good luck with that!

    • I’ll also add my semi-related “The effect of information on guessing the color of playing cards” (http://www.statisticool.com/cards.htm) just showing a toy example of when you add in extra information you can increase your chances. I think a lot of so-called ‘psi’ researchers are doing this type of thing unknowingly, increasing the effect sizes in also probably poorly designed studies.

  11. The study has now been published: https://royalsocietypublishing.org/doi/full/10.1098/rsos.191375

    As expected, very precisely estimating zero effect; also, no sheep-goat effect.

    I’m not as pessimistic as Andrew is, however: I think this is a pretty interesting and worth doing experiment, not merely because it includes Bem & psi proponents among the designers and so will have to be taken seriously by the handful of psi people left, but as a test bed for all of the experimental design techniques they use. Andrew doesn’t mention anything beyond pre-registration, and I don’t know if they were part of the design yet, but they go well beyond just pilot-then-pre-registration-plus-open-data to adversarial collaboration involving psi proponents, video-taping of the experiments, live pushing of the data to version-controlled repos on Github, and bringing in IT auditors to check the software & data. (Nor does it wind up being a flagrant waste of resources, at least going by the cost-benefit section where it all costs ~$44k; whereas, I would guess that several orders of magnitude more money has been spent just *publishing* papers about Bem 2011… Good methodology is ‘pound-wise, penny-foolish’.)

    I think that’s very interesting, and psi is the perfect topic on which to try this stuff out in the real world, precisely because there’s no real effect. In neuroscience, ecommerce, physics, etc it’s always a good idea to run an ‘A/A test’ on your fancy new experiment implementation; likewise here…

    • Gwern:

      I guess they dodged a bullet by not happening to get a p-value of less than 0.05 for their preregistered hypothesis.

      It could be fun to see what arguments the ESP proponents come up with to explain why this new study did not actually represent a failed replication. If this study was worth doing at all, it’s to see what responses it elicits from that crowd.

      The only thing is that, unlike some other topics of junk science such as have arisen in social priming, evolutionary psychology, and whatever is the latest flavor-of-the-month econometric technique, ESP does not have the support of powerful groups within academia and the media. So I’m guessing any response from the pro-ESP group will be kind of weak, as they don’t have the power to publish content-free responses in top journals.

Leave a Reply to zbicyclist Cancel reply

Your email address will not be published. Required fields are marked *