The Puzzle of Paul Meehl: An intellectual history of research criticism in psychology

There’s nothing wrong with Meehl. He’s great. The puzzle of Paul Meehl is that everything we’re saying now, all this stuff about the problems with Psychological Science and PPNAS and Ted talks and all that, Paul Meehl was saying 50 years ago. And it was no secret. So how is it that all this was happening, in plain sight, and now here we are?

An intellectual history is needed.

I’ll start with this quote, provided by a commenter on our recent thread on the “power pose,” a business fad hyped by NPR, NYT, Ted, etc., based on some “p less than .05” results that were published in a psychology journal. Part of our discussion turned on the thrashing attempts of power-pose defenders to salvage their hypothesis in the context of a failed replication, by postulating an effect that work under some conditions but not others, an effect that shines brightly when studied by motivated researchers but slips away in replications.

And now here’s Meehl, from 1967:

It is not unusual that (e) this ad hoc challenging of auxiliary hypotheses is repeated in the course of a series of related experiments, in which the auxiliary hypothesis involved in Experiment 1 (and challenged ad hoc in order to avoid the latter’s modus tollens impact on the theory) becomes the focus of interest in Experiment 2, which in turn utilizes further plausible but easily challenged auxiliary hypotheses, and so forth. In this fashion a zealous and clever investigator can slowly wend his way through a tenuous nomological network, performing a long series of related experiments which appear to the uncritical reader as a fine example of “an integrated research program,” without ever once refuting or corroborating so much as a single strand of the network. Some of the more horrible examples of this process would require the combined analytic and reconstructive efforts of Carnap, Hempel, and Popper to unscramble the logical relationships of theories and hypotheses to evidence. Meanwhile our eager-beaver researcher, undismayed by logic-of-science considerations and relying blissfully on the “exactitude” of modem statistical hypothesis-testing, has produced a long publication list and been promoted to a full professorship. In terms of his contribution to the enduring body of psychological knowledge, he has done hardly anything. His true position is that of a potent-but-sterile intellectual rake, who leaves in his merry path a long train of ravished maidens but no viable scientific offspring.

Exactly! Meehl got it all down in 1967. And Meehl was respected, people knew about him. Meehl wasn’t in that classic 1982 book edited by Kahneman, Slovic, and Tversky, but he could’ve been.

But somehow, even though Meehl was saying this over and over again, we weren’t listening. We (that is, the fields of statistics and psychometrics) were working on the edges, worrying about relatively trivial issues such as the “file drawer effect” (1979 paper by Rosenthal cited 3500 times) and missing the big picture, the problem discussed by Meehl, that researchers are working within a system that can perpetuate null results.

It’s a little bit like the vacuum energy in quantum physics. Remember that? The idea that the null state is not zero, that even in a vacuum there is energy, there are particles appearing and disappearing? It’s kinda like that in statistical studies: there’s variation, there’s noise, and if you shake it up you will be able to find statistical significance. Meehl has a sociological model of how the vacuum energy and the statistical significance operator can sustain a theory indefinitely even when true effects are zero.

But nobody was listening. Or were listening but in one ear and out the other. Whatever. It took us nearly half a century to realize the importance of p-hacking and the garden of forking paths, to realize that these are not just ways to shoot down joke pseudo-research such as Bem’s ESP study (published in JPSP in 2011) and the notorious Bible Code paper (published in Statistical Scence—how embarrassing!—in 1994), but that they are a key part of how the scientific process works. P-hacking and the garden of forking paths grease the wheels of normal science in psychology and medicine. Without these mechanisms (which extract statistical significance from the vacuum energy), the whole system would dry up, we’d have to start publishing everything where p is less than .25 or something.

So . . . whassup? What happened? Why did it take us nearly 50 years to what Meehl was saying all along? This is what I want the intellectual history to help me understand.

P.S. Josh Miller points to these video lectures from Meehl’s Philosophical Psychology class.

150 thoughts on “The Puzzle of Paul Meehl: An intellectual history of research criticism in psychology

  1. Well, after an initial critical mass of people and journals citing each other amassed, it became self-perpetuating, and very shortly after could have only been dissolved by declaring that almost entire fields of research, also present in prestigious universities, are invalid, and their practitioners charlatans (as Feynman practically did in his “cargo cult” speech – and apparently so did Meehl). It’s easy to see why it didn’t happen even without looking into any political aspects.

    So the real question, in my opinion, is how it all started. Maybe an interesting direction to start is how it came to be that legions of researchers started their careers without ever receiving training in formal epistemology.

    • As to how this all started, I’ve come across some interesting discussions while reading about the history of my Alma Mater, The University of Chicago, during the ’30s and ’40s when Robert M. Hutchins was president there. Chicago has the distinction of being the first major private US university built deliberately along the lines of the German research universities that had begun in the 19th century. (A similar model had been launched with the public land-grant universities that originally had been intended to focus on agriculture, but soon expanded to cover other fields as well.) In fact, William Rainey Harper, Chicago’s first president, sold John D. Rockefeller on the idea to get Rockefeller’s backing. These schools were thus focused less on the sorts of traditional educational objectives than on active research. Undergraduate programs languished, since funding came from sources that wanted to see “results” in the form of “scientific” research. At Chicago, quite a war raged over the basic nature of undergraduate education and the mission of the university more generally. When WW II ended, the debate was settled by the massive amounts of government funding for the scientific research model. Hutchins resigned in 1949.

      Of course the research model worked well in areas like chemistry, physics, and much of biology and medicine. But for the social sciences and humanities it turned out to be a terrible model for the sorts of reasons explained so well here. Academia started down the road of doing research on a production model that suited the sort of expectations that came out of the great wartime research programs. As James Bryant Conant regretfully warned young academics, it was now “Publish or perish.”

      So, misplaced ideas about the nature of research, coupled with misplaced ideas about science, coupled with intense incentives for conformity created this mess.

      • I basically agree, but it is VERY important to point out that replication problems are rampant in chemistry, biology, medicine and even physics, not just psychology. A parsimonious inference that the same factors underlie the problems in all of theses fields, and one thing all of these fields have in common is the incentive structure.

        • Agreed. I hadn’t intended to suggest that somehow that’s not the case. My point was more about this history of how this state of affairs came about.

          And what’s interesting, having been involved with research for over 35 years now, is that the problems with reproducibility (which BTW may not be necessarily about statistical methods as much as other factors) reflects the sorts of questions many in the more “hard” sciences (can’t think of a better term off-hand) that are being asked, questions that are pushing the scientific method to its limits. Much research now depends heavily on computer simulations to test theories where direct observation is not possible; other areas require extremely far reaching inferences regarding highly coupled systems; still others rely on inferences from events that cannot be repeated. I’m not casting aspersions on any of these generally, but I think we shouldn’t be surprise if results from such efforts are less reliable than we’re used to.

          And the pressure to publish definitely encourages risk taking and cutting corners. The peer review system too is likely overloaded and, so many think, even broken.

        • Remembering to ask this rather late in the game for this post, but I was wondering: do you have particular replication failures in mind when you mention chemistry and physics? I’ve seen plenty of examples in biology, medicine, psychology, and economics (none in sociology and anthropology, but I don’t doubt that’s mostly for my lack of looking), but I’ve not seen and haven’t been able to find myself any great collections of replication efforts (and subsequent failures) in chemistry & physics. Would be interested if you’ve got a particular cite(s) in mind!

        • Not exactly an answer to your question but Popper quotes a chemist – Joseph Black (Lectures on the Elements of Chemistry) as saying this: “A nice adaptation of conditions will make almost any hypothesis agree with the phenomena. This will please the imagination but will not advance our knowledge.” https://books.google.gr/books?id=wxzoBfQYhYAC&pg=PA71&lpg=PA71&dq=popper+black+please+the+imagination&source=bl&ots=CGm8hCpU71&sig=70CG84z9bPxQdK5wBtSK-Ww-Nxg&hl=en&sa=X&ved=0ahUKEwi5yfSD2IjTAhVDRhQKHQDcD3oQ6AEIGDAA#v=onepage&q=popper%20black%20please%20the%20imagination&f=false

    • My (naive) view is that it started with an influx of very large quantities of mostly Government money into research.

      The people disbursing the money didn’t have the inclination nor incentives to really care about the quality of the product that came out.

      The problem became worst in fields where judging the quality of the product was indirect and hard. e.g. Social Sciences.

  2. Many demos are this sort of pathway: do only this and nothing else and the application flows from here to there and that creates the image of larger capability when it’s actually the only working path. Much of this “science” is that demo style, a version of forking paths but with focus on the demo path taken. Sort of the opposite of Robert Frost; you take the road that works not the one less taken. For humor, look up the story of the first iPhone introduction; Jobs had a single working path through the early iOS, one so insecure much of the team sat in the audience drinking because they were sure it would crash.

  3. One of Meehl’s repeated points was that nearly everything is somewhat correlated – at least, in real systems with some degree of complexity. I just did a little exploring that illuminates that.

    Take two straight line x-y plots, and let line 2 have 1/10 the slope of line 1. Their correlation coefficients (both Pearson and Spearman rank) are 1.0. Square line 2 so it becomes a quadratic. Now Spearman’s rank correlation is still 1.0, and Pearson’s is 0.968. Adding noise will reduce these values, of course, but not eliminate them.

    Since nearly all curves will have some slope, then nearly all quasi-monotonic curves will show correlation. The actual value will depend mainly on the noise. Thus weak (or even not so weak) correlations have little to do with anything more than an practically inevitable non-zero slope of a line.

  4. I am not a psychologist, but I did train as one, and my impression was that the field tends to the ahistorical. A few of my professors taught their graduate courses in ways that were rooted in history, but the norm was syllabi stuffed with more recent papers (last ten or fifteen years maybe). The qualifying exam process in my particular department left me with the suspicion that there is no corpus of work that all psychologists, even those within a particular field, can agree is classic and worth knowing, which I think is a related fact. So while it would surprise me if many psychologists hadn’t heard of Meehl, it surprises me not at all that a particular insight he had got lost.

    There’s also the fact that it’s a pretty inconvenient insight. As your penultimate paragraph hints, it’s not actually clear how to stay employed in science without doing this stuff, never mind employed at a school prestigious enough to hand out PhDs, which means we probably have some selection bias in who’s training new scientists. And I’d bet (here’s a social science prediction for you) that most young scientists base their understanding of good research more on what their mentors did than on what their statistics professors said. I’m glad you note that the problem is bigger than psychology – this stuff bugged me in grad school, but it was much worse when I had a real job in epi; and when I went back for my stats masters, I swear, every time I asked about multiple comparisons in class it was as though I had placed something offensive in the punchbowl. Nobody had a satisfying answer for how to do it better.

    How are these issues faced by quantitative researchers in political science? I admit that I have found your attention to psychology over the years a little schadenfreudelicious, but sometimes it is easier to understand things that are surprising if you consider how they operate in a context that’s closer to home.

    • Erin:

      Good question about poli sci. I dunno. In recent years we have seen some junk poli sci, but even that is often published elsewhere (for awhile, the journal Psychological Science seemed to be making a specialty in this sort of thing).

      Two things I’ve thought of are:

      1. Poli sci data are almost always observational. Yes, you can do an experiment in poli sci but it’s unusual and with rare exceptions is recognized to have big problems when trying to extrapolate into the real world.

      2. Poli sci research is typically conflictual. Hence claims in poli sci tend to be carefully scrutinized, and reporters are aware of this too. In contrast, in psychology research it’s not always clear who is the opposition. So a claim like power pose can just sail through because it’s taken as pure science, while a similar claim about politics might be called into question.

  5. I believe mostly tribal rather than intellectual.

    Now I had read material like Meehl’s, if not Meehl in my semiotics course long before taking a course in statistics.

    When I went into clinical research, I easily recognized how common these problems were and saw meta-analysis as a way to redress some of the problems (used auditing scientific practice as a pragmatic definition of meta-analysis.)

    One intellectual barrier to other statisticians adequate appreciating (acting on?) the real issues did seem to be the following.
    (It just about the often less important file drawer problem but it blocked many statisticians from getting involved with considering multiple studies and how they were done and reported on and could be made sense of jointly – even if there ended up being only one study.)

    The (false) hope that studies could be taken as islands on their own (in David Cox’s words, stand mostly on their own) and hence, a definitive statistical analysis could be conducted on one study (with others studies being at most supportive).

    In line with this perhaps were

    1. Not realizing that publication biases were in the individual studies themselves rather than simply and only arising when studies were grouped together (someone I learned a lot about applied statistics from).

    2. I can take the study I happen to working on as is (as given comment http://www.stat.columbia.edu/~gelman/research/published/asa_pvalues.pdf) and not be concerned with what happen before elsewhere unless the investigators draw my attention to something in that regard (most statisticians I talked pre 2007.

    3. I can choose to only work with and find good investigators to work with and avoid Meehl like problems.

  6. I what’s needed, on top of an intellectual history, is also one of politics and personality.

    Back when I was studying physics in undergrad, it was pretty clear everyone took Feynman’s critiques of psychology rather seriously, and few considered psychology, sociology and etc. to be real rigorous sciences. I’m sure if Meehl was more known outside of stats/psychometrics/psychology communities, he’d have been taken very seriously by any physics student that read him. In physics, some basic history of science has always been included in the education right from undergrad or even secondary school. The general attitude amongst the brighter students in physics was also one of skepticism and rigour—students always demanded proofs or derivations for equations, and remained forever skeptical of their validity until those proofs were provided, or until an experiment showed very good fit to the data. In general, the students in the field were conscientious, rigorous, worked hard, and were both skeptical and humble with respect to knowledge and its limits. There was also no real political homogeneity, and the data seem to show that IQ is very high in physics. A lot of this also seems to apply to fields like statistics, computer science, and other fields requiring high IQ and rigour, like philosophy.

    But when I switched into psychology, I noticed some pretty dramatic changes. The skepticism I was used to was pretty much gone, students (and even professors) were only really critical of ideas that conflicted with their political ideology, but not of the generally weak methodology that pervades psychological research. People seemed more interested in proving narratives than actually doing rigorous science, or “getting things right”. I noticed students did not work as hard. (Given that psychology and social psychology are very politically homogeneously left, it stands to reason that the conscientiousness is actually lower in the field.) Moving from physics to psychology, it was also impossible not to notice a rather sharp decline in overall IQ and general ability to reason logically and mathematically amongst fellow students. People were maybe more extraverted and generally emotional in psychology, but the personal qualities that I think matter most for doing good science were far more rare than they had been in physics. And in terms of basic mathematical and reasoning ability, the unpleasant truth is simply that a significant number of psychologists are flat-out incompetent in these domains.

    I think the reason Meehl is ignored is because if you care about doing good, rigorous science, and have high intelligence, you generally don’t go into psychology (some exceptions maybe being psychometrics, some aspects of cognitive psychology and neuropsychology). Meehl was a statistical freak–most psychologists were nothing like him. Psychology—especially social psychology—is what you go into if you have a strong pre-comittment to some personal narrative. If you’re generally intellectually curious about discovering something about the world, you’ll be more likely to go elsewhere. Meehl was ignored by so many because, I think, they didn’t really care about what he had to say, or lacked the intelligence to recognize the truth of his arguments.

    For example, on a public discussion group, Daniel Lakens remarked:

    “I’m teaching a workshop today where no-one knows what Neyman-Pearson means. When I had beers yesterday evening, and a supervisor dropped in, and this supervisor also had no idea. What I am missing is a ravenous desire to understand what we do, how we can generate knowledge best. As long as this is lacking, our field can only be so good as the people in it. We need much better training. I know almost nothing about doing good science myself, and am really trying to learn, but it is arguably a bit late, 10 years after I started my PhD. For the field, yes, I agree 120 years is a bit late. The discussions we have about how to do a good replication are 119 years too late.”

    My impression is that too many psychologists just flat-out don’t care about these methodological issues, or care more about academic careerism and/or proving their narratives than they do about methodology. Since psychology didn’t start this way, the only thing I can see causing it is the politicization of psychology, which both made the degree more popular, lowered the standards, and created a culture pushing away intelligent and curious individuals, and drew in the wrong people.

    Maybe Andreski said it best:

    “Like many other things, the laudable ideal of combining education and research has its seamy side, in that graduate teaching offers an opportunity to recruit cheap (and in a way forced) labour for the captains of the research industry. Despite the lowering of standards connected with a massive increase in numbers, the fiction has been maintained that, in order to obtain a doctorate, the candidate must make a contribution to knowledge which, instead of an old-fashioned individual thesis, has more often than not come to mean a piece of work as somebody else’s research assistant. […]

    As the work of a research assistant is usually soul-destroying, this mode of financing graduate education adversely affects the quality of entrants into the profession, as many bright young men and women, prevented from really using their brains, and faced with the necessity of furnishing routine labour, prefer to do it for good money and gravitate towards advertising and market research. What is equally grave, moreover, the more intelligent students see through the sham of the whole enterprise and become either rebellious or cynical, or decide not to think too much and end by becoming timid and credulous conformists.

    While repelling the clever and the upright, the social research industry attracts dullards, for whom indeed it offers only the entry into the ranks of ‘scientists’, because no other form of ‘scientific’ research demands so little intelligence as door-to-door sociology or the lower forms of rat psychology. Rather than spanning the two cultures, as they ideally ought to, most of the social research industry’s employees fall between two stools, being neither literate nor numerate beyond the memorization of a few half-understood statistical formulae.”

    • As a psychologist I am sad to say that much of what you write is true. I switched from applied mathematics to psychology and have also been saddened by the lack of rigor among many of my colleagues and the preference for story telling and belief confirmation over good science. However, there are more and more of us trying to fight back – at least against the worst of the BS and I certainly have all of my students read the likes of Meehl and Popper.

    • Derek writes: “Given that psychology and social psychology are very politically homogeneously left, it stands to reason that the conscientiousness is actually lower in the field.”

      Nobody would ever accuse me of being sympathetic to the Left, but I don’t know how one can just take it for granted that they’re less conscientious than non-leftists.

      Or am I misinterpreting you here? Is it just the political homogeneity — of any sort — that you are suggesting leads to lower conscientiousness?

      • Conscientiousness tends to correlate with political ideology in such a manner that those leaning left are generally less conscientious than those leaning right. The correlation is not so strong you could make a strong inference about an individual person’s personality from their politics, but when a field as *almost entirely left*, it does basically follow that that field will on the whole be less conscientious than any other field that is not homogeneously left. It’s not that the field is homogeneous, but homogeneously left that allows inference about the personalities.

        Of course, this is only true if the correlations between personality and political ideology also hold among academics. Given the high conscientiousness required to succeed in academia, maybe this isn’t the case. My personal experience however says very left academics are still more open, neurotic, and less conscientiousness than the more centrist ones, and so the personality-ideology correlations still hold as they do in the rest of the population.

        • Making broad generalizations about crude measures of psychological constructs to criticize psychology for engaging in broad generalizations about crude measures of psychological constructs surely must win the “Ironical Prize” for the day.

        • There are different standards for speculative and conversational comments on blog posts and for publicly funded research programs and academic papers that consume taxpayer funding.

          It seems to me you’re absolutely certain intelligence and competence can’t possibly be playing a role, which I think is a priori a position that needs more data than the opposite position. I’ve sourced most of my comments, and made it clear they are speculative. If you have some actual criticisms, I’m open to them.

        • To respond to your second paragraph:

          It’s not that IQ could not be playing a role, but that if it is playing a role it is not a direct role as put forth. Rather it would have to be that the power within the system that maintains the status quo is controlled by those with less intelligence rather than by those with more. Or that research methods is so complicated that those with IQ’s approaching the genius level are not smart enough to recognize the problems within their own field. Which is contrary to most models of the effects of intelligence on academic disciplines.

  7. Meehl: ” In terms of his contribution to the enduring body of psychological knowledge, he has done hardly anything”

    Actually he has done a lot. Not by advancing knowledge but by slowing it down.

    It is called the opportunity cost of dead-weight.

    • Just the usual repackaging of uncertainty into certainty via medical expert alchemy (or maybe repackaging of statistical alchemy by a clinician who has taken a couple good statistics course) ;-)

  8. Isn’t this just Upton Sinclair: ““It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

  9. I think it is (like so many things) a consequence of many forces acting in concert: 1) Journals, promotion and tenure committees, funding agencies, journalists, and book publishers reward the “sexy” and spectacular over the incremental and cautious, 2) poor training in statistics and research design that increases the likelihood that researchers make gross errors and also makes it much less likely that reviewers and editors catch the errors, 3) a segmentation of the literature into lots of tiny ponds where we all get our papers reviewed by our fellow tadpoles, 4) the development of supposedly sophisticated analytic approaches (e.g., structural equation modeling, Hayes’ PROCESS models) that allow us to easily go down the garden of forking paths or just plain fabricate our findings without a large probability of ever being found out, and 5) a culture that actively discourages the correction of errors (ever tried to contact an editor about a serious error published in his/her journal?).

  10. Ask a random undergrad what’s considered an “easy” major. “Psych” will be a common answer.

    I suppose this watering down of Psch education has had a role to play in the propensity for crap generation.

  11. “I think the reason Meehl is ignored is because if you care about doing good, rigorous science, and have high intelligence, you generally don’t go into psychology.”

    Very sad and unfortunate, but probably true. Especially with computer science growing rapidly along with demands for STEM generally, I could see psychology having a recruiting problem, not to mention retention problem.

        • My argument as that intelligence and competence predict rigour and quality in research and theory. I am not claiming it is the sole factor, but that it is *a* predictor.

          Insofar as GRE is a decent proxy for IQ, the above link merely shows that it is plausible psychology does have a recruitment problem, when it comes to intelligence.

        • Let’s say that’s true. You would then expect the most intelligent psychologists at the top universities to have a inordinate amount of influence and to be leading the way in changing the field toward more logically defensible and robust methods. Do we see that?

          If not — Why?

        • I don’t think that intelligence necessarily leads to inordinate influence. That Meehl’s (and Cohen’s, Lykken’s, Feynman’s) criticisms have been ignored is a case in point. Often people ignore or even flat-out just dislike people that are more intelligent, especially when what is said puts the value of their research into question.

          It does seem to me that the the strong methodological criticisms and suggestions for reform do seem to come from those with backgrounds or competence in mathematics, statistics, philosophy, or other fields that demand more rigorous thinking and general intelligence. And it seems almost tautologically true that greater intelligence is required to both develop and purse methods that are more logically defensible and robust than those in use by the majority, regardless of the field.

          My argument was lass about difference in intelligence within the field though, and more about differences between psychology and other fields.

        • If the game is to get the top position at the best universities and be famous enough to land book contracts, speaker fees, and awards, then the most intelligent people figure out how to do that. I’ve interacted with many of the psychologists who grace the wall of shame that Gelman’s blog has become and I say this without a touch of sarcasm–they are truly geniuses. Most of them could run the intellectual wind-sprints faster than you and I and are incredibly fun to talk to because of their capacious intelligence and scholarship. The question that haunts me is how such smart people can fail to figure out the most basic constraint on our work–sampling error. The answer is simple. The incentive structure, from top (the granting agencies trying to please congressman) to the bottom–publication norms–rewards colorful, splashy ideas, with only a flimsy p-value attached to gain that position of eminence. Rigor, replication, systemic thinking is for the boring researcher–the drone–who by definition is not a good scientist under the current incentive structure. Change the incentive structure and the smart researchers will figure things out. Rely on their researcher’s integrity and psychology and other fields will wallow in their own morass for years to come.

        • +1 @ Brent. It’s all in the game. Now, if only they had made a sixth season of The Wire about academia :-)

          One point against the IQ explanation from someone who generally tends to think IQ explains a lot of things:
          While apparently psych is seen as an easy major in the US, it is so popular in Germany that at many universities 70%+ of students have the best possible school leaving grade. German School leaving grades probably aren’t as strongly predictive of IQ as SATs because they’re only semi-standardised, but still, IQs tend to be high. I’d suspect the selective admissions cause higher average IQs than in physics at the undergrad level.
          Psychology’s replication woes are not fewer in Germany, though.

          If you read Ben Goldacre’s books, you get the impression that medical research falls prey to the same problems in spite of higher methodological standards, because the incentives are correspondingly stronger.

          If anything, this talk about disciplinary IQs and linking it to the replication crisis will probably serve as an incentive against too much introspection in a discipline. Psychology dared reform and look at its reputation now. Etc.

        • Amusing that while criticizing psychology you use a measure created by psychologists, and then talk about another measure also largely created by people in psychology as a proxy for it. Though it’s not really, they are both proxies for an unmeasured variable which is a certain set of cognitive ability and other skills.

  12. It’s a mistake to think that this is just a Psychology problem. It’s also a big problem in Biology, Medicine, and I suspect in Economics and Policy areas as well. It has many more real serious ramifications than just that stuff gets published about Air Rage that is probably a joke.

    Not to mention that if you are a good scientist you can’t publish this kind of crap on this kind of schedule so you get pushed out in a Publish Or Perish way, so if the good scientists overwhelmingly Perish… we’re wasting a crapload of money on grants to the remaining ones and we’re losing out on the things the careful scientists would have done.

  13. OK, I’ll take the bait: I don’t think that psychology’s presumed low IQ is the main problem here. First off, the hidden assumption is that fields other than psychology have less of a replicability problem. That very much remains to be seen. Second, what Feynman identified (and others before him) is that “you must not fool yourself, and you are the easiest person to fool”. This kind of fooling oneself is, I believe, mostly the result of hindsight bias and confirmation bias, combined with a perverse incentive structure. I am not sure that a very high IQ protects one against these influences. Finally, the average IQ may be higher in physics and mathematics, but we are considering not the mean, but the tails of the distribution, because only relatively smart students become professors. At that high IQ range, factors other than intelligence determine success and quality of work. These factors include luck, creativity, the ability to persist in the face of adversity, the ability to communicate clearly, the ability to recognize important research topics, the ability to recognize one’s own strengths and weaknesses, etc. It is tempting to blame outside factors (“psychologists are stupid and that’s why they are in this mess”) and believe one’s own field is just hunky-dory. However, what I’ve seen from other fields is not consistent with that idea. For instance, across all empirical sciences the p-value remains the holy grail. Even the evidence for the Higgs boson was essentially evaluated based on a p-value. Few people would blame the use of p-values on low IQ.
    Cheers,
    E.J.

    • This specific problem is less important for economics and finance because most researchers use the same public or quasi-public datasets. So you can’t pull a fast one. An exception is experimental economics, which uses lab experiment data as in psychology–but for whatever reason, economic experiments replicate much better than do psychology experiments. This may be because econ papers are more detailed in the methodology (we really spell everything out).

      On the other hand, a problem for econ and finance is “secret data” — data that is proprietary, usually. How does science advance if you cannot even see the data?

      • A problem with economics, though, is that most modeling studies are long term projects on a valuable dataset with lots of researcher degrees of freedom. These are not accounted for. That, in turn, implies that most empirical studies in this area are very interesting but merely exploratory. While most value this first property, they equally neglect the latter.

    • I think where IQ comes in is that in the hard sciences you can’t bullshit your way to success so easily — there are constant, strong reality checks. There is no alternative route to career success that does not require high intellectual ability.

      • What can be said for the smartest in the field? Those whose intellects match those of their peers in the hard sciences? There certainly are at least a small number. Why are they not leading the way towards more logically defensible and more robust methods?

        • In what possible way could you be assessing conscientiousness that assigns the leaders in their disciplines — which requires an inordinate level of conscientiousness to accomplish, even more so if you are arguing they are less intelligent — as lacking in conscientiousness?

        • If there is a construct likely to be strongly related to missing the logic, detail, and importance of methodological issues by those who are highly intelligent it would be the level of development of critical thinking skills and the ability to apply them to that specific issue of the conceptual underpinnings of research methods. High levels of conscientiousness only suggests someone is more likely to complete projects on time, and not that they will be creative, well developed, and critically examined.

          The crux of conscientiousness as a construct is the need to check off items on a list and has nothing to do with an ability to understand the nuances of research methods.

          That said, I would still tend towards the most likely causal explanation being the incentive structure within the academic research system, rather than some conjectured level on an exceptionally crude hypothetical construct.

    • I agree. It’s not an IQ problem at all. The problem is that there aren’t many people who really want to disprove their own conjectures. It’s hard enough to think of something no one thought of before that might be true. To try really hard to disprove it is beyond most people’s capacity for self-delusion, as well as a recipe for career suicide. The reason this is more of a problem in some fields than others isn’t IQ… it’s inherent replicability. Well, that and the cultural willingness of fields to publish replications and/or refutations.

    • Despite my frustrations with psychology, I do agree with a lot of what you’re saying here. There’s some evidence that psychology is worse in some ways (e.g. Fanelli’s 2010 paper showing more publication bias), but then the replication rate in basic cancer research actually seems a lot worse than in psychology. If we’re just talking about replication, I do agree there’s little reason to attack psychology in particular.

      But I also don’t think psychology is bad just because it doesn’t replicate. Meehl’s other criticisms are still valid. Most of psychology is still “technologically worthless” and the field still relies on standardized effect sizes, and regularly neglects issues of clinical and practical significance. Frequently this is because most of the measurements aren’t connected to anything meaningful in reality in the first place. Terminology and conceptualization in psychology is often downright terrible, with the field full of tautological theory, meaninglessly vague folk-psychological terminology, and hopelessly broad constructs. The problem with psychology—even if it fixed its statistical and power issues—is that deficiencies in methodology (measurement) and theory I think would still leave the field largely broken. Social priming and ego depletion are perfect examples of all these problems, as it is unclear they were ever unified phenomena (large number of methods of induction and measurement), and in the latter case, whether and how it was significantly different from the everyday concept of ‘fatigue’ in the first place.

      I also agree it’s hard to know what happens at the tails. I can only say that I see an intelligence / competence difference still, and agree with Meehl (1973) when he said “I am somewhat old-fashioned in these attitudes. I believe there is no substitute for brains. I do not believe the difference between an IQ of 135 – perfectly adequate to get a respectable Ph. D. degree in clinical psychology at a state university – and an IQ of 185 is an unimportant difference between two human beings”. And although IQ may not directly predict replicability, or may not be the appropriate measure of competence here, I do do think a general lack of competence plays a large role in psychology’s problems.

        • Informally, partly based on IQ, partly I suppose on conscientiousness, and partly on intellectual humility, curiosity, and a general preference for correctness and rigour over politics and/or flashiness (maybe perfectionism could tap this?).

          There are some (admittedly weak, but nevertheless suggestive) data supporting the IQ and conscientiousness claims, but otherwise this is all based on my own experience and perceptions. I feel strongly about these things, but they are still in the end just my personal intuitions and speculation on the matter. I’m open to other positions. Maybe it all does just boil down to publish or perish due to to highly competitive funding, and maybe there’s nothing unique about psychology here. I don’t think so, but I’m not closed to the idea.

      • At this point in the discussion I’d really like to see some empirical data, say, of all science Nobel prize winners up until now. I don’t doubt that their AVERAGE IQ (if it had ever been measured) is far above average. But I’d be more interested in the variance: How low can your IQ be and you still stand a chance of winning the Nobel.

        Of course, there’s no data set like this (at least to my knowledge), but my hunch is: you’d be amazed!

        Yes, cognitive capacity is a necessary ingredient for doing good science, but it’s far from sufficient, and perhaps once you’re above a certain threshold, other things matter more than adding another 10 IQ points. Curiosity, creativity, and doggedness would seem to matter. Regarding the last, remember the example Feynman cited for good psychological science, the rat researcher who tried to figure out how rats get their bearings in a maze by eliminating one possible factor after another (in all likelihood Feynman alluded to P.T. Young, a pioneer in motivation & learning research)? Brains had not as much to do with that than a rigorous application of scientific principles like Occam’s Razor.

    • Agree with most of what EJ said. However, one (possibly nit-picky) point: the evidence for Higgs was based on a five sigma standard, which works out to be p~0.00000025 – a very high standard of evidence, even for those of us who sneer at p-values (myself included). Moreover, while I’m certainly speaking outside of my area of expertise, I would imagine that problems of causal density that we typically see in social sciences (especially psychology) are much less of an issue in particle physics, so NHST is less problematic in their field.

      • Whether the data is impressive evidence for a theory depends on what else can explain the results, which depends on how precise the theories predictions are and how reliable the methodology is. That small pvalue could just as well be because they messed up the experiment. I don’t know either way, but if that is the metric they are relying on it is nearly certain that particle physics will be in trouble shortly. That is, if it isn’t already up against the wall.

  14. All researchers have incentives to publish a lot, but not all researchers face the same upfront research costs. In the social sciences, randomized controlled trials are so much more expensive to conduct than observational studies. I think the relatively high cost of experimental research in psychology is what’s behind the p-hacking and the production of junk science.

    If you’re an economist or political scientist who uses publicly available, observational data (e.g., employment figures, roll call votes, survey responses), then you face relatively minimal costs. You certainly don’t need a big, fancy grant and an army of RAs to do good work. And so it’s hardly the end of the world if, after a few months of work, your hypotheses aren’t confirmed. You move on to the next project.

    But if a study requires the involvement of hundreds of participants, and if the study has a relatively long time frame, the costs of research can be serious. What happens if you fail to confirm your main predictions? Do you simply walk away from the expensive, lengthy, grant-funded project and start over again? Or do you p-hack?

    • Wb:

      Interesting thought, but I’ve often argued the opposite, that studies with psych undergrads or Mechanical Turk (like that PLOS-One joke study we discussed in the previous post) are so cheap and easy to do, that a simple cost-benefit analysis almost requires that you run such a study: Cost in time, effort, dollars, and brainpower is close to zero, and the potential benefit is a publication in the Prestigious Proceedings of the National Academy of Sciences. All you need is a hook that will satisfy Susan Fiske, and you’re in. From that perspective, why not buy the lottery ticket? Every CV looks a little better when garnished with PPNAS.

  15. I was recently at Minnesota where I gave a paper at Meehl’s old home. Niels Waller, who worked with Meehl, invited me. I told the audience how the last time I’d been there, Paul was in the audience and was holding up my book EGEK (Mayo 1996) which had recently come out. He’s long been a hero, so I was highly gratified and blown away that he endorsed my work.

    I, too, have often expressed amazement that the state of statistical foundations, instead of making progress, has (at least wrt significance tests) gone backwards. In Morrison and Henkel’s (1970) Significance Test Controversy many points were rehearsed as well known errors and misunderstandings of tests. Today, many of those same points are taken as flaws in the tests rather than the users. For example, they wondered how psychologists could make the mistake of construing significance at a given level as more impressive when it arose from a larger rather than a smaller sample size. They conducted studies to see how psychologists could be so wrong at understanding Neyman-Pearson tests! Nowadays, people routinely commit this fallacy, but they don’t see it as a fallacy. (I can send link once I reach land.)

    In my talks with Waller a couple of weeks ago I asked why psych hadn’t improved on a key weakness Meehl was on about: the questionable connections, often, between the measurements in psych experiments and the phenomenon supposedly being measured. In answering, he explained that he and Meehl felt that fixing this problem, while doable,would require huge expenditures of time and money which might go into physics, but not psych. Psych behavior wasn’t deemed important enough to go beyond fairly crude measurements. (Meehl, by the way, was a Freudian as well as a Popperian, and it bothered him greatly that Freudian theory was deemed untestable while mickey mouse, low severity, “tests” of significance seemed to give a patina of testability to other psych areas.) (Can give links once I land.)

    Now that lack of replication has cropped up in other fields like biology, the criticisms from 40 and 50 years ago are being rehearsed. Never having fully explained the logic of testing, many current critics are falling into many of the same misunderstandings as older ones. I’m hoping Gigerenzer, Gelman, Glymour and I can help put things right at our session at the PhilScience Assoc meeting in November.

    • >>>wasn’t deemed important enough<<<

      If I were a funding agency, I'd be far more willing to spend a million dollars to understand the intricacy of some high temperature superconductor than to spend a million dollars on taking the precise measurements (whatever they may be) to figure out whether women really are wearing more Red on their fertile days.

        • Well, for one, a typical problem in Math does not need “huge expenditures of time and money ” nor “big, fancy grants and an army of RAs to do good work” nor “the involvement of hundreds of participants & a relatively long time frame” et cetra.

          For what you little you spend on a mathematician, pencils & coffee you get a pretty good return on you investment, I would say.

        • The biggest cost in a grant is funding for postdocs or PhD students. The RA and expt. costs are generally not that big (unless we are talking about supposedly fancy methods). I suspect a math grant proposal would not be hugely different from one of these other grant proposals in terms of money asked for.

      • Hopefully it’s not the case that all of psych is as vapid as many regard the influences on behavior of hormonal cycles in women, else the research shouldn’t continue. Thus far, disappointingly, we don’t see that kind of recommendation coming out of the replication crisis in psych, even though it’s fairly clear which areas and types of studies are borderline science. I’ll be talking about this and philStat tomorrow at the LSE https://errorstatistics.com/2016/05/09/some-bloglinks-for-my-lse-talk-tomorrow-the-statistical-replication-crisis-paradoxes-and-scapegoats/
        However, if we started asking the “public” to rate the importance of some theoretical physics, we might not have much of it.

        • If you insisted on funding allocation by direct public vote you’d have very little research in the first place.

        • Well, a philo prof like Mayo makes over six figures and has been doing it for 3-4 decades. So that puts the price tag for her philo research (none of which rises to the level of vapid “influences on behavior of hormonal cycles in women”) at around $3 million.

        • Well there’s six figure salaries in Psych too.

          But with them you’ve got to also pay for the rats, the RA’s to shock the rats, the surveys to ask women their skirt color, Amazon gift-cards for participating undergrads, gender-queer canvassers, mechanical turk payoffs, et cetra.

  16. I admit I’m not familiar with Paul Meehl’s work; but based on the post above, could it be that he showed everyone the problem clearly but he was ahead of his time because viable alternatives only came much later with hierarchical modeling / Stan etc?

    i.e. Many smart people understood Meehl and the problem with what they were doing but didn’t know what to do instead?

  17. Ascribing the use of knowingly poor research methods to inadequate intelligence within an entire field feels a little like an argument that one might make in kindergarten. I am in no position to judge whether or not the argument is correct but at the very least I do not think it is well posited. For starters, the argument implicitly assumes that researchers who possess a high IQ would endeavor to make it a priority to perform top-notch research. However, as another commentator has already mentioned, it seems equally as likely that a high IQ is also compatible with the desire to maximize monetary return or within-period fame or speaking gigs or…any number of alternative goals that may be best maximized by poor research practice. Let us not make the mistake of assuming a high IQ necessarily compels a researcher to perform good work in the strict sense. Also, this is to say nothing of the fact that IQ itself is an illusive measure.

    Beyond that, I do not believe pooping on an entire field of research is the best use of time.

  18. Since most of us agree on the good judgement of Paul Meehl here’s a comment by him that’s relevant to the current focus on “group-learning” and other unconventional methods of pedagogy:

    “In one respect the clinical case conference is no different from other academic group phenomena such as committee meetings, in that many intelligent, educated, sane, rational persons seem to undergo a kind of intellectual deterioration when they gather around a table in one room. The cognitive degradation and feckless vocalization characteristic of committees are too well known to require comment. Somehow the group situation brings out the worst in many people, and results in an intellectual functioning that is at the lowest common denominator, which in clinical psychology and psychiatry is likely to be pretty low.”

    • Meehl has some really memorable quotes. As a graduate student in Psychology, my favorite comes from “Theoretical risks and tabular asterisks”, which sums up my enthusiasm with the (not so) current state of affairs in psychological research:

      “I consider it unnecessary to persuade you that most so-called “theories” in the soft areas of psychology (clinical, counseling, social,personality, community, and school psychology) are scientifically unimpressive and technologically worthless.”

    • Rahul:

      “Since most of us agree on the good judgement of Paul Meehl” . . . I agree. It makes me wonder if we could find some anti-Meehl people out there.

      I wonder about prominent psychologists such as Daryl Bem, John Bargh, Daniel Gilbert, and Susan Fiske who have produced or defended much of the research that we spend so much time dissing on this blog: Would they say that Meehl was a know-nothing blowhard who was just getting in the way of serious research, or would they say that they’re the ones following the true spirit of Meehl, or maybe there’s a different tack that they would take?

      I assume Meehl would not have agreed with the claim by Gilbert’s publicist that “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.”

      I’m guessing they’d dismiss Meehl as a crank, but who knows?

      • It’s indistinguishable from 100% using the shockingly bad analysis by Gilbert. Has this been discussed? I don’t think so, and yet they’re jumping in to rereplicating the studies that Gilbert had fidelity problems with. They should straighten out what will count as nonreplication ahead of time.

      • I’m guessing two things.

        1. As a well known statistician said to me after a talk on challenges in making senses of published randomized clinical trials I gave in 1997 – “you paint an overly bleak picture of clinical research”. I was hoping they were not too wrong about that and I am sure they thought they were not at all wrong about that. So things are not that bad (recall “Empirical estimates suggest most published research is true”)

        2. As Mayo raised above, no group (yet) sees themselves in a position of expecting to be able to meaningfully improve things and benefit from the efforts/risks of trying. NMO (not my opportunity.)

        • >”As Mayo raised above, no group (yet) sees themselves in a position of expecting to be able to meaningfully improve things and benefit from the efforts/risks of trying. NMO (not my opportunity.)”

          Maybe it would be worth it if these funding institutions spent ~20-30% on some kind of internal affairs department to investigate incompetence/fraud. Just like we don’t want police stealing things or not knowing how to safely handle their weapons, we don’t want researchers cherrypicking the data they report or misinterpreting p-values (or whatever the primary method of inference will be in the future).

        • Good idea. I’d not even wait for complaints. I’d start random audits the way they do for drug use in HAZMAT Drivers etc.

        • Like Rahul, I very much would like to see random audits but I would be very surprised if any funding agency would want this.

          Funding agency have limited funds – why decrease the amount of studies done and muddy all the good publicity the publicists of those you funded will generate for you with the “only 50% of audited studies found adequate” possibility (even if you can count on some academic group arguing it might have been 100% adequate).

          The hope here seems to be with foundation grants where the founder simply want to make the world better (or thats what I heard anyway) http://metrics.stanford.edu/about-us

  19. I actually think it’s funny not to consider Meehl hugely influential though for other reasons. His whole treatment of the superiority of statistical prediction over clinical prediction has been hugely influential in some of the applied settings that I have run into.

  20. Aren’t most of these foundations set up to achieve tax relief? So the money is being donated for all the wrong reasons. No basis for thinking the funders have an incentive to support “good” research or to see it improve. In effect the universities are functioning as giant money-laundering agencies, perhaps. After all, bad studies don’t hurt the funders any; just the hapless people who try to govern their lives or raise their children by the results. Not to be overly cynical or anything, nor am I a conspiracy theorist, but Lysenkoism doesn’t naturally prevail over good knowledge – someone has to want to, and have the power to, keep it on top.

    • Proposal:

      Set up a research budget amount, and then have individual taxpayers vote on a random selection of proposals using a score voting system (assign a number 0-10 to each proposal). Estimate from this an average preference for each proposal (averaged across all taxpayers) to determine a probability of a proposal being funded by cryptographic random number generator lottery.

      Let the individual taxpayers also vote on the total dollar amount (per citizen per year) they believe would be optimal to allocate to funding research. Have the actual dollar amount track the 10 year exponential weighted moving average of this quantity (constantly inflation adjusted to present dollars) so that total funding changes slowly and tends to converge on what the taxpayers actually want on average.

      To make the whole thing manageable, each taxpayer could “subscribe” to a list of topics they’re interested in, and be given the opportunity to vote quarterly on a random selection from within those topics.

      I do NOT think this would be all that difficult, and I think it would be FAR superior to anything we’ve got currently.

      • Actually, having voters vote on a random selection of proposals would be a neat way to solve a variety of other issues, especially political issues. I can trust a mob of randomly selected people more than Congress.

        That being said, the proposals would merely change the incentives of these scientists, as they try to focus-group test their pitches to be ‘popular’ enough to win the masses. You wouldn’t see niche issues be “investigated”, but instead “trending” topics that could gather quick upvotes (“Can viewing ‘The Dress’ and other internet memes help cancer patients survive?”). Not sure whether this system would lead to better outcomes. It won’t lead to worse outcomes though.

        • I like the idea of a random selection of people “conscripted” to, as a full-time job, evaluate government proposals. Pay them well, maybe 2x GDP/capita/yr, and have them serve say 6 year terms, with 1/6 replaced each year by new random people. Choose them by crypto-RNG from the Soc Security rolls for people between 18 and say 75, must be healthy enough to travel by air, and have at least a HS diploma.

          Get about 6000 of them I think, so we’re replacing 1000 each year.

        • I’m not convinced this is a good idea at all. In some aspects it reminds me of the Communist-era rabidly anti-intellectual milieu in China, USSR etc.

          Say you were admitted to the hospital: would you prefer 20 randomly chosen hospital employees, drawn from among janitors to kitchen staff etc. to be voting on your best treatment course? All healthy and with a HS diploma of course.

          If not, then why would you think that a broad body of untalented decision-makers should work wonders for running the government? There’s nothing magical about direct democracy.

        • Not RUNNING the government, basically AUDITING the government. ie. providing some sort of evaluation of policies and laws. Let me put it this way Reddit, and StackExchange both work better than the US Govt so I believe this would be an improvement.

        • If you let random people with a HS degree vote on complex research proposals you are going to get something like the typical Youtube comment thread (or worse) and nowhere close to Reddit or Stackexchange.

          Forget running the nation; have you even seen a corporation run its show by random vote agnostic to talent or training?

        • To be clear, I’m not suggesting we just get people to vote independently at home in their pajamas. I’m saying get these people together, if not all together at least in say 4 or 5 regional groups, and have them discuss the issues (hence the need to be healthy enough to travel).

          In that context, people with expertise are able to influence those without expertise through their discussion.

        • >>> people with expertise are able to influence those without expertise through their discussion.<<<

          So, what advantage did we get by muddying up the pool in the first place?

          If you are anyways going to count on the experts being able to influence those without expertise why not just stick to a group of experts?

          I can totally understand if you are arguing for a larger or more diverse decision-making body of *experts*. But that's a different argument entirely.

        • Also, I think the Communist USSR ideas were more or less the exact opposite. Get some elites to decide what everyone else would do, and then ram it down their throats with no recourse… Whereas this is all about giving the people who are affected by policies a chance to audit those policies and provide feedback. If nothing else, just publishing a kind of “congressional audit record” in which this group does score-voting on various policies, procedures, outcomes, etc and then makes a public record of those votes together with published “opinions” written by small groups within the council. Kind of like “majority opinion” and “dissenting opinion” in the court system.

        • Ok, now we are shifting the goalposts: I’ve nothing against just getting non-binding feedback. The decision makers are not bound to act upon it.

          What I thought was silly was replacing expert decision-makers by a direct democratic vote agnostic to training, experience or expertise.

        • I admit, I’d like something more binding than pure feedback, but I am also fine with saying that we shouldn’t have pure democracy. When it comes to funding grant applications, I’d say score voting from scientists counting equally with score voting from random pool of people to generate probabilities for a crypto-random lottery would be actually a great idea. Talk about checks and balances.

          I have to admit though, I do like the name Boaty McBoatface.

        • >>>score voting from scientists counting equally with score voting from random pool of people<<<

          Sounds like buying a perfectly decent sensor & adding a large pool of random noise to it.

          Just because you worry the sensor may not be absolutely accurate doesn't mean adding random noise makes it any better.

        • Suppose you have an aerial missile targeting system and every time you ask it to shoot a missile at an enemy tank it chooses a University where people teach Monetarist theories of economics….

          Then just asking 10 random people off the street to tell you whether a particular satellite photo looks like it has an enemy tank in it would get you a much better result.

          I think we’ll have to just agree to disagree here, but I wonder if you’ve ever applied for grant funding and gotten comments back from grant funding committees? I would be surprised if you had because everyone I know who has gets comments back that are for the most part full of shit.

        • Also “a broad body of untalented decision makers?” that’s a pretty elitist attitude.

          Out of these 6000 randomly selected people something like 2000 of them would have a college degree, 800 of them would be masters or doctorate or professional degree educated (engineering, law, medicine, etc). That means there’d be more highly educated people in this body than in all of Congress!!! (there are only 535 people in house and senate put together, and how many engineers or doctors? Like 5 total? it’s all full of lawyers and career politicians)

          Every single one of them would have experience making hundreds of important decisions affecting their families each year, from career, job, nutrition, investment, where to live, commuting times, purchasing insurance, personal medical decisions related to diseases or injuries in their families…

          A truly random selection of Americans would be a pretty competent group on average. It’s the idealogues who run for office that worry me.

        • Is it elitist if I insist only Doctors treat my cancer? Or insist that only civil engineers certify that my house’s foundation is structurally sound?

          The college degrees hardly matter if they are not from a relevant domain! Would it be much better if we knew that the committee deciding whether a pesticide should be banned was full of PhDs with French Literature?

        • You have to remember, we’re comparing this to CONGRESS not a system where in every decision we find some people who are experts in that subject matter and then just ask them.

          Also, doctors kill something like 250,000-500,000 people a year in the US from preventable mistakes, and the biggest reduction in cancer seems to have come from getting people to stop smoking cigs not improvements in treatment.

          I think people over-estimate the ability of most experts and under-estimate the ability of others.

        • >>>doctors kill something like 250,000-500,000 people a year in the US from preventable mistakes<<<

          How is that relevant? Fine. Even experts make mistakes. Sure. I doubt anyone claims doctors are infallible. We could try and make better doctors.

          But are you arguing for randomization of medical care delivery as a solution? If instead of doctors you solicited random opinions to guide treatment you think you'd get less deaths?

      • Does *direct* democracy really work well? Is the tyranny of the masses what we want?

        Why do we not allow lay-people to vote on every law & decision? Surely the technology is at the point where this is feasible, at least in many smaller nations.

        If you ran a nation by direct vote I don’t think you will like the product you get very much. Sounds scary.

        • Well, I’m willing to give it a try, or at least bring it into the mix. For example using the popular estimate and the expert evaluations together on equal footing to assign the funding lottery probability.

          One thing I know is that the funding system we have is seriously problematic at many levels.

        • Rahul, Daniel:

          I can’t believe you’re debating this political point when you could be over at the other post commenting on the delicious jerk pork and plantains that I had for lunch today.

  21. This may sound over the top, but after looking at that paper again I would be in favor of a new Nobel prize category named after Paul Meehl. He should also be awarded the first one posthumously, and it be declared that he is the only person to ever receive such an honor. It just hits on all the points that have caused/allowed researchers to deviate from their stated mission so well.

    As I said, over the top.

  22. RE: “… I do think a general lack of competence plays a large role in psychology’s problems.” & “Maybe an interesting direction to start is how it came to be that legions of researchers started their careers without ever receiving training in formal epistemology.” & “… Paul Meehl was saying 50 years ago. And it was no secret. So how is it that all this was happening, in plain sight, and now here we are?”

    In the years before his death, Meehl was torn between two explanations for psychology’s failure as science. One was my Hubble hypothesis:
    https://drive.google.com/open?id=0B9ZkjwmG6iM4QTdOdUFCRmY0Qk0
    which uses the Hubble Space Telescope as a metaphor to argue how psychology’s failures are sustained by long standing, mutually enabling weaknesses in the paradigm’s discovery methods and scientific standards. These interdependent weaknesses function like a distorted lens on the research process by variously sustaining the illusion of theoretical progress, obscuring the need for fundamental reforms, and both constraining and misguiding reform efforts.

    Although an enthusiastic fan of the Hubble Hypothesis, Meehl also cited the incompetence of most psychologists a major culprit and was hopeful that a future, more competent generation of psychologists would be able to make substantial progress with existing methods — a position I considered tantamount to leaving the runway lights for Amelia Earhart. He elaborates on the incompetence problem in this 20 minute fireside-chat audio memo to me in 1978 with some delightful and demoralizing examples from his own Department of Psychology at the University of Minnesota:
    https://drive.google.com/open?id=0B9ZkjwmG6iM4dlZqOEI2aGNfOXM

    • John: Thanks for sharing this, especially the fireside chat!

      Meehl does seem to set the Hubble hypothesis aside, but here is I think an excellent example of the Hubble hypothesis http://www.sciencedirect.com/science/article/pii/S0895435610001381

      What happened here is that the supposedly preferred adjustment method in SAS to remove artifacts from the plot was an implementation of a misconstrued formula that actually added artifacts. Human error had resulted from a lack of mathematical/statistical understanding of the technique.

      Now I had a real challenge convincing the authors of the paper I worked with that something was wrong with the math as they had done a literature review of papers using this technique (correspondence analysis) and ever paper had used the preferred adjustment (I think they mentioned they reviewed over 100 papers) – not everyone could be wrong – right?

      The collection of published papers that used SAS and the _preferred_ adjustment all have the Hubble’s lens flaw and have studied illusory phenomena.

      If they provided what is called a Burt matrix, which is a sufficient statistic, the Hubble’s lens flaw can be corrected.

      If they did not, they are a relevant subset of published papers that are 100% known for sure not to be reproducible!

      John: Andrew will give you my email if you wish to follow-up on this.

  23. The exact same phenomenon that occurred with Meehl occurred with Keynes. He criticized the then emerging field of econometrics and no one listened to him. Now after more than 50 years we have realized that everything he said was correct.

    Its no so much that people were not listening. Its that they didn’t want to hear it.

  24. The ease with which criticism could simply be ignored, disappeared, no matter who was making it, when in the form of a published paper is in interesting contrast to the hysterical shrieks provoked by “destructo critics.” The latter are effective, whereas the former weren’t, because the critique sticks to individual papers, if valid it forces individual accountability, both of the authors and their enablers. The resistance to issuing retractions another illustration of the difference in effectiveness between targetted and general criticism.

  25. I have been thinking about some things:

    1) I think it was Meehl that stated or suggested that everything correlates with everything. I think this has been supported by a paper by Standing at al. (1991) titled “Empirical statistics: IV: Illustrating Meehl’s sixth law of soft psychology: everything correlates with everything”.

    2) I have been thinking about how psychological “knowledge” and the psychological literature might be more of a reflection of what those engaging in psychological research hypothesize and research, rather than a valid or balanced reflection of reality or truths. This view may also be apllied to individual studies, for I reason that only those constructs or terms or measures used in research are then used to formulate conclusions and used concerning further hypotheses, theories, and research.

    Combining 1) and 2), would it be reasonable to state that one could (with some clever thinking and testing) design a study that leads to finding significant correlations or associations between certain constructs (and thus possibly to highlighting a certain view, or to certain conclusions) because of merely including certain constructs or measures in a study and excluding others?

    • The answer is obviously: yes, of course. Meehl talks about this in his 1989 lecture series when he covers pilot studies, crud factor & methods covariance, and how they combine to create to an epistemological sieve (lecture 7; t ~ 51mins to 65mins).

      • Would a focus on effect sizes combat this problematic issue?

        Might a focus on effect sizes instead of statistical significance for instance result in searching for better explanatory constructs and measures (and theories), which would lead to not being able to possibly intentionally or unintentionally misrepresent matters?

        • Looking at effect size is a step in the right direction, but if you want to test your actual theory (rather than the strawman null hypothesis), you would usually collect a different type of data altogether.

          It starts with people designing their study around NHST. Then they try to salvage it by looking at effect size or using bayes factors or whatever, but it is too late. They already collected the wrong type of data.

          Eg, you will be trying to explain the shape of individual learning curves rather than comparing the average scores of group A vs group B.

        • “Looking at effect size is a step in the right direction, but if you want to test your actual theory (rather than the strawman null hypothesis), you would usually collect a different type of data altogether.”

          My knowledge and capabilities regarding statistics are extremely limited. I can only largely think very basically and intuitively about most things regarding statistics. I have just read some things about regression analysis which intuitively fit with the point I was trying to make about how using only certain constructs and measures might result in a misinterpretation of reality or of what is realy crucially important concerning something investigated. The following may not make any sense, but I would appreciate it if I may share the following:

          If I understood things correctly, in regression for instance you can try and look for the “importance” of correlated variables and for the “unique” contribution of variables. Again, I think I can only largely think about these things on a very basic level, but I would reason that this might fit well with “solving” the possibly problematic issues of 1) correlating variables (“everything correlates with everything”) and 2) intentionally or unintentionally misrepresenting things via study designs by including certain variables and excluding other variables.

          Intuitively, I reason that using regression analyses and looking for things like the “importance” of variables and for “unique” contributions for instance might fit well with attempting to focus on effect sizes (I think R2 can be considered to be an effect size?) and with theory development and testing.

        • Regression can be good for making predictions/interpolation, but the coefficients only mean something if your model is correct (or at least approximately so). And in reality, if you are using a regression model, it is probably an arbitrary one chosen because you had whatever data available. See discussion here (I just saw someone responded to me there and haven’t checked it yet):

          https://statmodeling.stat.columbia.edu/2023/02/28/multiverse-r-package/#comment-2183169

          What I am talking about is completely different, check out what this researcher did with a theory about neuronal growth developed during WW1:

          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2916857/

          There is no comparing group A vs B, effect size, or anything that looks like your standard psych or bio paper. Instead neurons were observed carefully, a possible explanation for the observations was guessed (abduced), then the logical consequences of this theory are deduced and explored via computational modelling.

          I’ve seen over and over that people were making great progress in the first half of the 20th century using this approach, then they switched to NHST about 1950. Now everything seems “so complicated” because researchers are only collecting infertile/sterile data like the difference between groups.

        • “Instead neurons were observed carefully, a possible explanation for the observations was guessed (abduced), then the logical consequences of this theory are deduced and explored via computational modelling.”

          The paper is too complicated for me, but I appreciate your description of their approach. To me, this seems to possibly resonate well with my wonderment about the role of logic, reasoning, observation, and common sense in Psychological Science.

          It sometimes seems to me that these things aren’t even acknowledged as being important in conducting Psychological Science. I am still wondering why I wasn’t taught much about logic, reasoning, and argumentation at university. It also makes me think about the philosophers and scientists that lived hundreds of years ago, and what they thought and wrote about. I think observation and reasoning might be severely under-appreciated in Psychological Science.

        • I think observation and reasoning might be severely under-appreciated in Psychological Science.

          Earlier when I mentioned learning curves I was thinking of this paper and ref 40 therein (you can get both on sci-hub):

          T h e method of stating theories as mathematical functions, deriv-
          ing certain observation equations from these initial postulates, and
          checking these equations against experimental data has been called
          by Troland (43) the method of “mathematical hypothesis.” T h e
          work of Troland (43) and Hecht ( 1 3 ) on the visual receptor pro-
          cess is one illustration of the application of this method to some
          problems related to psychology. Mathematical hypotheses have been
          used extensively in the physical sciences. T h i s method is valuable
          because it makes possible an accurate quantitative test of the agree-
          ment between scientific theory and experimental data.

          T h e use of “mathematical hypotheses” does not mean simply fit-
          ting an equation to some data. I t means the use of rational as op-
          posed to empirical equations. T h e r e seems to be some confusion
          in the literature regarding this distinction. For instance, M a x Rley-
          er’s ( 2 5 ) equation, which in essence is an empirical equation, is
          sometimes referred to as a rational formulation. Rarlow ( l ) , in-
          stead of rec0gnizin.E the empirical nature of both Rleyer’s ( 2 5 ) equa-
          tion and of Thurstone’s ( 3 5 , 39) earlier equations, states that per-
          haps the distinction between an empirical and a rational equation
          may be the extent to which the parameters can be determined by
          rigorous methods. It is, therefore, essential to distinguish accurately
          between rational and empirical equations. Empirical equations are
          simply equations which happen to fit certain data. They are not
          derived from scientific hypotheses, and consequently our knowledge
          of scientific theory is not affected by the use of empirical equations.
          A rational equation is developed from certain basic theories and repre-
          sents accurately the relationships that will be found in the experi-
          mental data, if these theories are true. If the equation does not
          fit the data, it proves that the theories in question cannot be ap-
          plied to the particular data. If the equation does fit, it proves that
          the theories can be applied to those data. T h a t a given theory must
          be applied to certain data cannot be proved. A discussion of the
          value of rational equations in advancing the knowledge of scientific
          theory has recently been given by Gray (12). Rational equations
          are powerful tools for the advancement of science, because they
          make it possible to test accurately the agreement between experi-
          mental data and scientific theory.

          https://www.tandfonline.com/doi/abs/10.1080/00221309.1934.9917847

          That may be more accessible to you, idk. But you can see even in 1933 it was required to explain philosophy of science 101 in psychology journals (eg, difference between deriving an equation and fitting a curve), but progress was being made.

          Today it should be much easier using computers, back then they had to work everything out by hand which forced them making into extra simplifying assumptions. As mentioned above, I’ve found it valuable multiple times to go back to pre-1950 literature then implement the model (or a variation) using modern technology.

  26. “So . . . whassup? What happened? Why did it take us nearly 50 years to what Meehl was saying all along? This is what I want the intellectual history to help me understand.”

    I have thought about this form time to time, and my conclusion is that 1) it is hard (in certain cases and for some people) to determine who or what is right or wrong or good or bad, 2) there are almost no real mechanisms in place in Psychological Science to correct things or hold people accountable, and 3) the first two things might contribute to a natural selection of bad psychological science and scientists, which further enhance, and contribute to, issues 1 and 2. Then it’s a repeating and reinforcing cycle of poor rasoning, poor rules and methods, poor research, and poor “scientists”.

    I see little corrective options and processes in this all. To me, there seems to be no true psychological science police. To me, there seems to be no psychological science court of law. To me, there doesn’t even seem to be clear rules or laws. (Unjustified) popularity of rules, methods, and “scientists” are what mostly drive behaviour it seems. Psychological scientists are trusted to know what they are doing, and to assess and eveluate themselves. This is all like marking your own homework to me. And I think it has been proven that this doesn’t work.

    And if through some miracle there comes a point in time that at least some awareness and reflection is present, it is all blamed on obscure and vaguely described things like “the incentives” as though they magically appeared out of thin air all of a sudden. After that, the same (incompetent? corrupt?) people that contributed to the mess are trusted to come up with solutions and improvements that might make things even worse (e.g. see Table 1 of Edwards & Roy 2017).

    • “After that, the same (incompetent? corrupt?) people that contributed to the mess are trusted to come up with solutions and improvements that might make things even worse (e.g. see Table 1 of Edwards & Roy 2017).”

      P-rep for the win!

      • When I was following developments some years ago, one of the things I wondered more and more was which of the proposed improvements and solutions regarding the problematic issues in academia and Psychological Science might end up being a new addition to an updated Table 1 of Edwards & Roy (2017).

    • It’s quite interesting to me how a certain perspective can dramatically change. I can clearly remember when starting with my education at university that I viewed the scientific literature as a collection of facts and truths and valid evidence, etc. Later on I realized that due to several issues like questionable research practices, file drawer issue, etc. things might not be that factual and truthful and valid as I once thought and assumed.

      I also once thought that scientists at universities were the smartest and wisest people available, which I don’t think is the case anymore at this point in time. I once thought students at universities are generally taught well, which I don’t think anymore at this point in time. I once thought scientists at universities had good and noble intentions, which at this point in time I don’t think is necessarily the case for some and perhaps many scientists. I once thought that scientific evidence and reasoning is important and useful and positive, which I now think might not necessarily be the case.

      I also think I might have thought about some things here and there and concluded that facts, knowledge, and truths coming from science might be more of a reflection of the people that came up with hypotheses and experiments and reasoning than it is a valid or balanced reflection of reality. And I have thought that there may be countless valid and useful thoughts, reasoning, and hypotheses that are not expressed, heard, and further investigated.

      If I combine such thoughts, I have come to realize that what I once thought was a great environment to be in or an interesting subject to be involved with has become something almost completely opposite of that for me. It’s interesting, and somewhat shocking, to me that in a relatively short time I have come to view things very differently. I sometimes wonder if this also might be the case for some scientists when they look back at their academic career. There must be some papers of scientists concluding similar things, and deciding that academia/science is not for them. Or there might be some papers of scientists looking back at their career and wondering what it was all about, and whether it was worth it. I think these type of reflective papers might be very useful and important.

  27. Quote from above: “So . . . whassup? What happened? Why did it take us nearly 50 years to what Meehl was saying all along? This is what I want the intellectual history to help me understand.”

    I have been thinking about this from time to time, and I wonder whether it might be the case that it really doesn’t matter whether lots of social science is true or correct or not concerning a few possibly crucial things:

    1) lots of students pay lots of money to be educated in this stuff at universities. Perhaps they are being told fairytales, and being educated in nonsense. It doesn’t matter as long as they pay money, and there is a general idea that they are being educated in something valid and worthwhile.

    2) scientific publishing companies earn lots of money to publish all this stuff in journals. Perhaps they are publishing fairytales and nonsense. Perhaps the findings turn out to be non-replicable. Perhaps the findings have dozens of caveats concerning them which may only become apparent years down the road. It all doesn’t matter as long as they earn money from what they publish, and there is a general idea that they publish something valid.

    3) people and orginazations use this stuff to choose and/or grounds to do something or implement something. Perhaps the reasons, evidence, argumentation concerning these decisions and actions are build on fairytales and nonsense. Perhaps that doesn’t matter (or might even be useful for some people), as long as there is a general idea that this stuff is valid to use in making these kinds of decisions.

    Maybe the big value of this stuff is these 3 things, which make people lots of money and give them power to do stuff. And it doesn’t matter concerning these 3 things that some finding is not true or turns out to be non-replicable in 5 years time or whatever. Every use of this stuff may have already been extracted from it by that time…

    • Quote from above: “And it doesn’t matter concerning these 3 things that some finding is not true or turns out to be non-replicable in 5 years time or whatever. Every use of this stuff may have already been extracted from it by that time…”

      If I am not mistaken, I think journal impact factors are calculated based on a 2 or 3 or 5 year window maximum. If this is correct, in some cases at least, I wonder how this relates to the above quote.

      Perhaps journal impact factors are/were developed and used to focus on and reward short term thinking, which in turn easier allows for journal to keep publishing this stuff over and over and over again.

      I have often wondered whether it might be interesting for someone to calculate and publish an alternative impact factor which is based on all the years that specific journal has existed. That would make more scientific sense than an impact factor based on 5 years or something like that to me.

Leave a Reply to Rahul Cancel reply

Your email address will not be published. Required fields are marked *