Preregistration is a floor, not a ceiling.

This comes up from time to time, for example someone sent me an email expressing a concern that preregistration stifles innovation: if Fleming had preregistered his study, he never would’ve noticed the penicillin mold, etc.

My response is that preregistration is a floor, not a ceiling. Preregistration is a list of things you plan to do, that’s all. Preregistration does not stop you from doing more. If Fleming had followed a pre-analysis protocol, that would’ve been fine: there would have been nothing stopping him from continuing to look at his bacterial cultures.

As I wrote in comments to my 2022 post, “What’s the difference between Derek Jeter and preregistration?” (which I just added to the lexicon), you don’t preregister “the” exact model specification; you preregister “an” exact model specification, and you’re always free to fit other models once you’ve seen the data.

It can be really valuable to preregister, to formulate hypotheses and simulate fake data before gathering any real data. To do this requires assumptions—it takes work!—and I think it’s work that’s well spent. And then, when the data arrive, do everything you’d planned to do, along with whatever else you want to do.

Planning ahead should not get in the way of creativity. It should enhance creativity because you can focus your data-analytic efforts on new ideas rather than having to first figure out what defensible default thing you’re supposed to do.

Aaaand, pixels are free, so here’s that 2002 post in full:
Continue reading

“Hot hand”: The controversy that shouldn’t be. And thinking more about what makes something into a controversy:

I was involved in a recent email discussion, leading to this summary:

There is no theoretical or empirical reason for the hot hand to be controversial. The only good reason for there being a controversy is that the mistaken paper by Gilovich et al. appeared first. At this point we should give Gilovich et al. credit for bringing up the hot hand as a subject of study and accept that they were wrong in their theory, empirics, and conclusions, and we can all move on. There is no shame in this for Gilovich et al. We all make mistakes, and what’s important is not the personalities but the research that leads to understanding, often through tortuous routes.

“No theoretical reason”: see discussion here, for example.

“No empirical reason”: see here and lots more in the recent literature.

“The only good reason . . . appeared first”: Beware the research incumbency rule.

More generally, what makes something a controversy? I’m not quite sure, but I think the news media play a big part. We talked about this recently in the context of the always-popular UFOs-as-space-aliens theory, which used to be considered a joke in polite company but now seems to have reached the level of controversy.

I don’t have anything systematic to say about all this right now, but the general topic seems very worthy of study.

“Here’s the Unsealed Report Showing How Harvard Concluded That a Dishonesty Expert Committed Misconduct”

Stephanie Lee has the story:

Harvard Business School’s investigative report into the behavioral scientist Francesca Gino was made public this week, revealing extensive details about how the institution came to conclude that the professor committed research misconduct in a series of papers.

The nearly 1,300-page document was unsealed after a Tuesday ruling from a Massachusetts judge, the latest development in a $25 million lawsuit that Gino filed last year against Harvard University, the dean of the Harvard Business School, and three business-school professors who first notified Harvard of red flags in four of her papers. All four have been retracted. . . .

According to the report, dated March 7, 2023, one of Gino’s main defenses to the committee was that the perpetrator could have been someone else — someone who had access to her computer, online data-storage account, and/or data files.

Gino named a professor as the most likely suspect. The person’s name was redacted in the released report, but she is identified as a female professor who was a co-author of Gino’s on a 2012 now-retracted paper about inducing honest behavior by prompting people to sign a form at the top rather than at the bottom. . . . Allegedly, she was “angry” at Gino for “not sufficiently defending” one of their collaborators “against perceived attacks by another co-author” concerning an experiment in the paper.

But the investigation committee did not see a “plausible motive” for the other professor to have committed misconduct by falsifying Gino’s data. “Gino presented no evidence of any data falsification actions by actors with malicious intentions,” the committee wrote. . . .

Gino’s other main defense, according to the report: Honest errors may have occurred when her research assistants were coding, checking, or cleaning the data. . . .

Again, however, the committee wrote that “she does not provide any evidence of [research assistant] error that we find persuasive in explaining the major anomalies and discrepancies.”

The full report is at the link.

Some background is here, also here, and some reanalyses of the data are linked here.

Now we just have to get to the bottom of the story about the shredder and the 80-pound rock and we’ll pretty much have settled all the open questions in this field.

We’ve already determined that the “burly coolie” story and the “smallish town” story never happened.

It’s good we have dishonesty experts. There’s a lot of dishonesty out there.

Refuted papers continue to be cited more than their failed replications: Can a new search engine be built that will fix this problem?

Paul von Hippel writes:

Stuart Buck noticed your recent post on A WestLaw for Science. This is something that Stuart and I started talking about last year, and Stuart, who trained as an attorney, believes it was first suggested by a law professor about 15 years ago.

Since the 19th century, the legal profession has had citation indices that do far more than count citations and match keywords. Resources like Shepard’s Citations—first printed in 1873 and now published online along with competing tools such as JustCite, KeyCite, BCite, and SmartCite—do not just find relevant cases and statutes; they show lawyers whether a case or statute is still “good law.” Legal citation indexes show lawyers which cases have been affirmed or cited approvingly, and which have been criticized, reversed, or overruled by later courts.

Although Shepard’s Citations inspired the first Science Citation Index in 1960, which in turn inspired tools like Google Scholar, today’s academic search engine still rely primarily on citation counts and keywords. As a result, many scientists are like lawyers who walk into the courtroom unaware that a case central to their argument has been overruled.

Kind of, but not quite. A key difference is that in the courtroom there is some reasonable chance that the opposing lawyer or the judge will notice that the key case has been overruled, so that your argument that hinges on that case will fail. You have a clear incentive to not rely on overruled cases. In science, however, there’s no opposing lawyer and no judge: you can build an entire career on studies that fail to replicate, and no problem at all, as long as you don’t pull any really ridiculous stunts.

Hippel continues:

Let me share a couple of relevant articles that we recently published.

One, titled “Is Psychological Science Self-Correcting?, reports that replication studies, whether successful or unsuccessful, rarely have much effect on citations to the studies being replicated. When a finding fails to replicate, most influential studies sail on, continuing to gather citations at a similar rate for years, as though the replication had never been tried. The issue is not limited to psychology and raises serious questions about how quickly the scientific community corrects itself, and whether replication studies are having the correcting influence that we would like them to have. I considered several possible reasons for the persistent influence on studies that failed to replicate, and concluded that academic search engines like Google Scholar may well be part of the problem, since they prioritize highly cited articles, replicable or not, perpetuating the influence of questionable findings.

The finding that replications don’t affect citations has itself replicated pretty well. A recent blog post by Bob Reed at the University of Canterbury, New Zealand, summarized five recent papers that showed more or less the same thing in psychology, economics, and Nature/Science publications.

In a second article, published just last week in Nature Human Behaviour, Stuart Buck and I suggest ways to Improve academic search engines to reduce scholars’ biases. We suggest that the next generation of academic search engines should do more than count citations, but should help scholars assess studies’ rigor and reliability. We also suggest that future engines should be transparent, responsive and open source.

This seems like a reasonable proposal. The good news is that it’s not necessary for their hypothetical new search engine to dominate or replace existing products. People can use Google Scholar to find the most cited papers and use this new thing to inform about rigor and reliability. A nudge in the right direction, you might say.

A new piranha paper

Kris Hardies points to this new article, Impossible Hypotheses and Effect-Size Limits, by Wijnand and Lennert van Tilburg, which states:

There are mathematical limits to the magnitudes that population effect sizes can take within the common multivariate context in which psychology is situated, and these limits can be far more restrictive than typically assumed. The implication is that some hypothesized or preregistered effect sizes may be impossible. At the same time, these restrictions offer a way of statistically triangulating the plausible range of unknown effect sizes.

This is closely related to our Piranha Principle, which we first formulated here and then followed up with this paper. It’s great to see more work being done in this area.

With journals, it’s all about the wedding, never about the marriage.

John “not Jaws” Williams writes:

Here is another example of how hard it is to get erroneous publications corrected, this time from the climatology literature, and how poorly peer review can work.

From the linked article, by Gavin Schmidt:

Back in March 2022, Nicola Scafetta published a short paper in Geophysical Research Letters (GRL) . . . We (me, Gareth Jones and John Kennedy) wrote a note up within a couple of days pointing out how wrongheaded the reasoning was and how the results did not stand up to scrutiny. . . .

After some back and forth on how exactly this would work (including updating the GRL website to accept comments), we reformatted our note as a comment, and submitted it formally on December 12, 2022. We were assured from the editor-in-chief and publications manager that this would be a ‘streamlined’ and ‘timely’ review process. With respect to our comment, that appeared to be the case: It was reviewed, received minor comments, was resubmitted, and accepted on January 28, 2023. But there it sat for 7 months! . . .

The issue was that the GRL editors wanted to have both the comment and a reply appear together. However, the reply had to pass peer review as well, and that seems to have been a bit of a bottleneck. But while the reply wasn’t being accepted, our comment sat in limbo. Indeed, the situation inadvertently gives the criticized author(s) an effective delaying tactic since, as long as a reply is promised but not delivered, the comment doesn’t see the light of day. . . .

All in all, it took 17 months, two separate processes, and dozens of emails, who knows how much internal deliberation, for an official comment to get into the journal pointing issues that were obvious immediately the paper came out. . . .

The odd thing about how long this has taken is that the substance of the comment was produced extremely quickly (a few days) because the errors in the original paper were both commonplace and easily demonstrated. The time, instead, has been entirely taken up by the process itself. . . .

Schmidt also asks a good question:

Why bother? . . . Why do we need to correct the scientific record in formal ways when we have abundant blogs, PubPeer, and social media, to get the message out?

His answer:

Since journals remain extremely reluctant to point to third party commentary on their published papers, going through the journals’ own process seems like it’s the only way to get a comment or criticism noticed by the people who are reading the original article.

Good point. I’m glad that there are people like Schmidt and his collaborators who go to the trouble to correct the public record. I do this from time to time, but mostly I don’t like the stress of dealing with the journals so I’ll just post things here.

My reaction

This story did not surprise me. I’ve heard it a million times, and it’s often happened to me, which is why I once wrote an article called It’s too hard to publish criticisms and obtain data for replication.

Journal editors mostly hate to go back and revise anything. They’re doing volunteer work, and they’re usually in it because they want to publish new and exciting work. Replications, corrections, etc., that’s all seen as boooooring.

With journals, it’s all about the wedding, never about the marriage.

Mindlessness in the interpretation of a study on mindlessness (and why you shouldn’t use the word “whom” in your dating profile)

This is a long post, so let me give you the tl;dr right away: Don’t use the word “whom” in your dating profile.

OK, now for the story. Fasten your seat belts, it’s going to be a bumpy night.

It all started with this message from Dmitri with subject line, “Man I hate to do this to you but …”, which continued:

How could I resist?

https://www.cnbc.com/2024/02/15/using-this-word-can-make-you-more-influential-harvard-study.html

I’m sorry, let me try again … I had to send this to you BECAUSE this is the kind of obvious shit you like to write about. I like how they didn’t even do their own crappy study they just resurrected one from the distant past.

OK, ok, you don’t need to shout about it!

Following the link we see this breathless press release NBC news story:

Using this 1 word more often can make you 50% more influential, says Harvard study

Sometimes, it takes a single word — like “because” — to change someone’s mind.

That’s according to Jonah Berger, a marketing professor at the Wharton School of the University of Pennsylvania who’s compiled a list of “magic words” that can change the way you communicate. Using the word “because” while trying to convince someone to do something has a compelling result, he tells CNBC Make It: More people will listen to you, and do what you want.

Berger points to a nearly 50-year-old study from Harvard University, wherein researchers sat in a university library and waited for someone to use the copy machine. Then, they walked up and asked to cut in front of the unknowing participant.

They phrased their request in three different ways:

“May I use the Xerox machine?”
“May I use the Xerox machine because I have to make copies?”
“May I use the Xerox machine because I’m in a rush?”
Both requests using “because” made the people already making copies more than 50% more likely to comply, researchers found. Even the second phrasing — which could be reinterpreted as “May I step in front of you to do the same exact thing you’re doing?” — was effective, because it indicated that the stranger asking for a favor was at least being considerate about it, the study suggested.

“Persuasion wasn’t driven by the reason itself,” Berger wrote in a book on the topic, “Magic Words,” which published last year. “It was driven by the power of the word.” . . .

Let’s look into this claim. The first thing I did was click to the study—full credit to CNBC Make It for providing the link—and here’s the data summary from the experiment:

If you look carefully and do some simple calculations, you’ll see that the percentage of participants who complied was 37.5% under treatment 1, 50% under treatment 2, and 62.5% under treatment 3. So, ok, not literally true that both requests using “because” made the people already making copies more than 50% more likely to comply: 0.50/0.375 = 1.33, and increase of 33% is not “more than 50%.” But, sure, it’s a positive result. There were 40 participants in each treatment, so the standard error is approximately 0.5/sqrt(40) = 0.08 for each of those averages. The key difference here is 0.50 – 0.375 = 0.125, that’s the difference between the compliance rates under the treatments “May I use the Xerox machine?” and “May I use the Xerox machine because I have to make copies?”, and this will have a standard error of approximately sqrt(2)*0.08 = 0.11.

The quick summary from this experiment: an observed difference in compliance rates of 12.5 percentage points, with a standard error of 11 percentage points. I don’t want to say “not statistically significant,” so let me just say that the estimate is highly uncertain, so I have no real reason to believe it will replicate.

But wait, you say: the paper was published. Presumably it has a statistically significant p-value somewhere, no? The answer is, yes, they have some “p < .05" results, just not of that particular comparison. Indeed, if you just look at the top rows of that table (Favor = small), then the difference is 0.93 - 0.60 = 0.33 with a standard error of sqrt(0.6*0.4/15 + 0.93*0.07/15) = 0.14, so that particular estimate is just more than two standard errors away from zero. Whew! But now we're getting into forking paths territory: - Noisy data - Small sample - Lots of possible comparisons - Any comparison that's statistically significant will necessarily be huge - Open-ended theoretical structure that could explain just about any result. I'm not saying the researchers were trying to anything wrong. But remember, honesty and transparency are not enuf. Such a study is just too noisy to be useful.

But, sure, back in the 1970s many psychology researchers not named Meehl weren’t aware of these issues. They seem to have been under the impression that if you gather some data and find something statistically significant for which you could come up with a good story, that you’d discovered a general truth.

What’s less excusable is a journalist writing this in the year 2024. But it’s no surprise, conditional on the headline, “Using this 1 word more often can make you 50% more influential, says Harvard study.”

But what about that book by the University of Pennsylvania marketing professor? I searched online, and, fortunately for us, the bit about the Xerox machine is right there in the first chapter, in the excerpt we can read for free. Here it is:

He got it wrong, just like the journalist did! It’s not true that including the meaningless reason increased persuasion just as much as the valid reason did. Look at the data! The outcomes under the three treatment were 37.5%, 50%, and 62.5%. 50% – 37.5% ≠ 62.5% – 37.5%. Ummm, ok, he could’ve said something like, “Among a selected subset of the data with only 15 or 16 people in each treatment, including the meaningless reason increased persuasion just as much as the valid reason did.” But that doesn’t sound so impressive! Even if you add something like, “and it’s possible to come up with a plausible theory to go with this result.”

The book continues:

Given the flaws in the description of the copier study, I’m skeptical about these other claims.

But let me say this. If it is indeed true that using the word “whom” in online dating profiles makes you 31% more likely to get a date, then my advice is . . . don’t use the word “whom”! Think of it from a potential-outcomes perspective. Sure, you want to get a date. But do you really want to go on a date with someone who will only go out with you if you use the word “whom”?? That sounds like a really pretentious person, not a fun date at all!

OK, I haven’t read the rest of the book, and it’s possible that somewhere later on the author says something like, “OK, I was exaggerating a bit on page 4 . . .” I doubt it, but I guess it’s possible.

Replications, anyone?

To return to the topic at hand: In 1978 a study was conducted with 120 participants in a single location. The study was memorable enough to be featured in a business book nearly fifty years later.

Surely the finding has been replicated?

I’d imagine yes; on the other hand, if it had been replicated, this would’ve been mentioned in the book, right? So it’s hard to know.

I did a search, and the article does seem to have been influential:

It’s been cited 1514 times—that’s a lot! Google lists 55 citations in 2023 alone, and in what seem to be legit journals: Human Communication Research, Proceedings of the ACM, Journal of Retailing, Journal of Organizational Behavior, Journal of Applied Psychology, Human Resources Management Review, etc. Not core science journals, exactly, but actual applied fields, with unskeptical mentions such as:

What about replications? I searched on *langer blank chanowitz 1978 replication* and found this paper by Folkes (1985), which reports:

Four studies examined whether verbal behavior is mindful (cognitive) or mindless (automatic). All studies used the experimental paradigm developed by E. J. Langer et al. In Studies 1–3, experimenters approached Ss at copying machines and asked to use it first. Their requests varied in the amount and kind of information given. Study 1 (82 Ss) found less compliance when experimenters gave a controllable reason (“… because I don’t want to wait”) than an uncontrollable reason (“… because I feel really sick”). In Studies 2 and 3 (42 and 96 Ss, respectively) requests for controllable reasons elicited less compliance than requests used in the Langer et al study. Neither study replicated the results of Langer et al. Furthermore, the controllable condition’s lower compliance supports a cognitive approach to social interaction. In Study 4, 69 undergraduates were given instructions intended to increase cognitive processing of the requests, and the pattern of compliance indicated in-depth processing of the request. Results provide evidence for cognitive processing rather than mindlessness in social interaction.

So this study concludes that the result didn’t replicate at all! On the other hand, it’s only a “partial replication,” and indeed they do not use the same conditions and wording as in the original 1978 paper. I don’t know why not, except maybe that exact replications traditionally get no respect.

Langer et al. responded in that journal, writing:

We see nothing in her results [Folkes (1985)] that would lead us to change our position: People are sometimes mindful and sometimes not.

Here they’re referring to the table from the 1978 study, reproduced at the top of this post, which shows a large effect of the “because I have to make copies” treatment under the “Small Favor” condition but no effect under the “Large Favor” condition. Again, given the huge standard errors here, we can’t take any of this seriously, but if you just look at the percentages without considering the uncertainty, then, sure, that’s what they found. Thus, in their response to the partial replication study that did not reproduce their results, Langer et al. emphasized that their original finding was not a main effect but an interaction: “People are sometimes mindful and sometimes not.”

That’s fine. Psychology studies often measure interactions, as they should: the world is a highly variable place.

But, in that case, everyone’s been misinterpreting that 1978 paper! When I say “everybody,” I mean this recent book by the business school professor and also the continuing references to the paper in the recent literature.

Here’s the deal. The message that everyone seems to have learned, or believed they learned, from the 1978 paper is that meaningless explanations are as good as meaningful explanations. But, according to the authors of that paper when they responded to criticism in 1985, the true message is that this trick works sometimes and sometimes not. That’s a much weaker message.

Indeed the study at hand is too small to draw any reliable conclusions about any possible interaction here. The most direct estimate of the interaction effect from the above table is (0.93 – 0.60) – (0.24 – 0.24) = 0.33, with a standard error of sqrt(0.93*0.07/15 + 0.60*0.40/15 + 0.24*0.76/25 + 0.24*0.76/25) = 0.19. So, no, I don’t see much support for the claim in this post from Psychology Today:

So what does this all mean? When the stakes are low people will engage in automatic behavior. If your request is small, follow your request with the word “because” and give a reason—any reason. If the stakes are high, then there could be more resistance, but still not too much.

This happens a lot in unreplicable or unreplicated studies: a result is found under some narrow conditions, and then it is taken to have very general implications. This is just an unusual case where the authors themselves pointed out the issue. As they wrote in their 1985 article:

The larger concern is to understand how mindlessness works, determine its consequences, and specify better the conditions under which it is and is not likely to occur.

That’s a long way from the claim in that business book that “because” is a “magic word.”

Like a lot of magic, it only works under some conditions, and you can’t necessarily specify those conditions ahead of time. It works when it works.

There might be other replication studies of this copy machine study. I guess you couldn’t really do it now, because people don’t spend much time waiting at the copier. But the office copier was a thing for several decades. So maybe there are even some exact replications out there.

In searching for a replication, I did come across this post from 2009 by Mark Liberman that criticized yet another hyping of that 1978 study, this time from a paper by psychologist Daniel Kahenman in the American Economic Review. Kahneman wrote:

Ellen J. Langer et al. (1978) provided a well-known example of what she called “mindless behavior.” In her experiment, a confederate tried to cut in line at a copying machine, using various preset “excuses.” The conclusion was that statements that had the form of an unqualified request were rejected (e.g., “Excuse me, may I use the Xerox machine?”), but almost any statement that had the general form of an explanation was accepted, including “Excuse me, may I use the Xerox machine because I want to make copies?” The superficiality is striking.

As Liberman writes, this represented a “misunderstanding of the 1978 paper’s results, involving both a different conclusion and a strikingly overgeneralized picture of the observed effects.” Liberman performs an analysis of the data from that study which is similar to what I have done above.

Liberman summarizes:

The problem with Prof. Kahneman’s interpretation is not that he took the experiment at face value, ignoring possible flaws of design or interpretation. The problem is that he took a difference in the distribution of behaviors between one group of people and another, and turned it into generic statements about the behavior of people in specified circumstances, as if the behavior were uniform and invariant. The resulting generic statements make strikingly incorrect predictions even about the results of the experiment in question, much less about life in general.

Mindfulness

The key claim of all this research is that people are often mindless: they respond to the form of a request without paying attention to its context, with “because” acting as a “magic word.”

I would argue that this is exactly the sort of mindless behavior being exhibited by the people who are promoting that copying-machine experiment! They are taking various surface aspects of the study and using it to draw large, unsupported conclusions, without being mindful of the details.

In this case, the “magic words” are things like “p < .05," "randomized experiment," "Harvard," "peer review," and "Journal of Personality and Social Psychology" (this notwithstanding). The mindlessness comes from not looking into what exactly was in the paper being cited.

In conclusion . . .

So, yeah, thanks for nothing, Dmitri! Three hours of my life spent going down a rabbit hole. But, hey, if any readers who are single have read far enough down in the post to see my advice not to use “whom” in your data profile, it will all have been worth it.

Seriously, though, the “mindlessness” aspect of this story is interesting. The point here is not, Hey, a 50-year-old paper has some flaws! Or the no-less-surprising observation: Hey, a pop business book exaggerates! The part that fascinates me is that there’s all this shaky research that’s being taken as strong evidence that consumers are mindless—and the people hyping these claims are themselves demonstrating the point by mindlessly following signals without looking into the evidence.

The ultimate advice that the mindfulness gurus are giving is not necessarily so bad. For example, here’s the conclusion of that online article about the business book:

Listen to the specific words other people use, and craft a response that speaks their language. Doing so can help drive an agreement, solution or connection.

“Everything in language we might use over email at the office … [can] provide insight into who they are and what they’re going to do in the future,” says Berger.

That sounds ok. Just forget all the blather about the “magic words” and the “superpowers,” and forget the unsupported and implausible claim that “Arguments, requests and presentations aren’t any more or less convincing when they’re based on solid ideas.” As often is the case, I think these Ted-talk style recommendations would be on more solid ground if they were just presented as the product of common sense and accumulated wisdom, rather than leaning on some 50-year-old psychology study that just can’t bear the weight. But maybe you can’t get the airport book and the Ted talk without a claim of scientific backing.

Don’t get me wrong here. I’m not attributing any malign motivations to any of the people involved in this story (except for Dmitri, I guess). I’m guessing they really believe all this. And I’m not using “mindless” as an insult. We’re all mindless sometimes—that’s the point of the Langer et al. (1978) study; it’s what Herbert Simon called “bounded rationality.” The trick is to recognize your areas of mindlessness. If you come to an area where you’re being mindless, don’t write a book about it! Even if you naively think you’ve discovered a new continent. As Mark Twain apparently never said, it ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.

The usual disclaimer

I’m not saying the claims made by Langer et al. (1978) are wrong. Maybe it’s true that, under conditions of mindlessness, all that matters is the “because” and any empty explanation will do; maybe the same results would show up in a preregistered replication. All I’m saying is that the noisy data that have been presented don’t provide any strong evidence in support of such claims, and that’s what bothers me about all those confident citations in the business literature.

P.S.

After writing the above post, I sent this response to Dmitri:

OK, I just spent 3 hours on this. I now have to figure out what to do with this after blogging it, because I think there are some important points here. Still, yeah, you did a bad thing by sending this to me. These are 3 hours I could’ve spent doing real work, or relaxing . . .

He replied:

I mean, yeah, that’s too bad for you, obviously. But … try to think about it from my point of view. I am more influential, I got you to work on this while I had a nice relaxing post-Valentine’s day sushi meal with my wife (much easier to get reservations on the 15th and the flowers are a lot cheaper), while you were toiling away on what is essentially my project. I’d say the magic words did their job.

Good point! He exploited my mindlessness. I responded:

Ok, I’ll quote you on that one too! (minus the V-day details).

I’m still chewing on your comment that you appreciate the Beatles for their innovation as much as for their songs. The idea that there are lots of songs of similar quality but not so much innovation, that’s interesting. The only thing is that I don’t know enough about music, even pop music, to have a mental map of where everything fits in. For example, I recently heard that Coldplay song, and it struck me that it was in the style of U2 . But I don’t really know if U2 was the originator of that soaring sound. I guess Pink Floyd is kinda soaring too, but not quite in the same way . . . etc etc … the whole thing was frustrating to me because I had no sense of whether I was entirely bullshitting or not.

So if you can spend 3 hours writing a post on the above topic, we’ll be even.

Dmitri replied:

I am proud of the whole “Valentine’s day on the 15th” trick, so you are welcome to include it. That’s one of our great innovations. After the first 15-20 Valentine’s days, you can just move the date a day later and it is much easier.

And, regarding the music, he wrote:

U2 definitely invented a sound, with the help of their producer Brian Eno.

It is a pretty safe bet that every truly successful musician is an innovator—once you know the sound it is easy enough to emulate. Beethoven, Charlie Parker, the Beatles, all the really important guys invented a forceful, effective new way of thinking about music.

U2 is great, but when I listened to an entire U2 song from beginning to end, it seemed so repetitive as to be unlistenable. I don’t feel that way about the Beatles or REM. But just about any music sounds better to me in the background, which I think is a sign of my musical ignorance and tone-deafness (for real, I’m bad at recognizing pitches) more than anything else. I guess the point is that you’re supposed to dance to it, not just sit there and listen.

Anyway, I warned Dmitri about what would happen if I post his Valentine’s Day trick:

I post this, then it will catch on, and it will no longer work . . . just warning ya! You’ll have to start doing Valentine’s Day on the 16th, then the 17th, . . .

To which Dmitri responded:

Yeah but if we stick with it, it will roll around and we will get back to February 14 while everyone else is celebrating Valentines Day on these weird wrong days!

I’ll leave him with the last word.

“Exclusive: Embattled dean accused of plagiarism in NSF report” (yup, it’s the torment executioners)

The story is at Retraction Watch:

Erick Jones, the dean of engineering at the University of Nevada in Reno, appears to have engaged in extensive plagiarism in the final report he submitted to the National Science Foundation for a grant, Retraction Watch has learned.

The $28,238 grant partially supported a three-day workshop that Jones and his wife, Felicia Jefferson, held for 21 students in Washington, DC, in April 2022 titled “Broadening Participation in Engineering through Improved Financial Literacy.” Jefferson received a separate award for $21,757.

Jones submitted his final report to the agency in May 2023. Retraction Watch obtained a copy of that report through a public records request to Jones’s previous employer, the University of Texas at Arlington, and identified three published sources of extended passages he used without citation or quotation marks. . . .

Lots more details at the link.

Those torment executioners keep on tormenting us.

In all seriousness, between the University of Nevada salary and the National Science Foundation grants, this guy’s been taking a lot of public funds to produce some really bad work. Seems like a real failure of oversight at UNR and NSF to let this go on like this.

Good work by Retraction Watch to follow up on this story.

P.S. I forgot to include the quotations from UNR luminaries:

“In Erick Jones, our University has a dynamic leader who understands how to seize moments of opportunity in order to further an agenda of excellence,” University President Brian Sandoval said.

“What is exciting about having Jones as our new dean for the College of Engineering is how he clearly understands the current landscape for what it means to be a Carnegie R1 ‘Very High Research’ institution,” Provost Jeff Thompson said. “He very clearly understands how we can amplify every aspect of our College of Engineering, so that we can continue to build transcendent programs for engineering education and research.”

Also this:

Jones was on a three-year rotating detail at National Science Foundation where he was a Program Director in the Engineering Directorate for Engineering Research Centers Program.

Shameful that he would work for NSF and then pay that back by taking their money and submitting a plagiarized report. But, hey, I guess that’s what University President Brian Sandoval would call “understanding how to seize moments of opportunity in order to further an agenda of excellence.”

What could be more excellent than taking government funds and using it to publish plagiarized reports and crayon drawings?

It sounds like it’s fine with UNR if their dean of engineering does this. I wonder what would happen to any UNR students who did this sort of thing? I guess they wouldn’t get paid $372,127 for it, but maybe the university could at least give them a discount on their tuition?

P.P.S. That all said, let’s not forget that there are much worse cases of corruption out there. The UNR case just particularly bothers me, partly because it’s close to what I do—except that when my colleagues get NSF funds, we don’t use them to produce plagiarized reports—and partly because the problems are so obvious: as discussed in our earlier post, you can look at the papers this dean of engineering had published and see that they are incoherent and have no content, even before getting into the plagiarism. It’s hard to believe that his hiring was a mere lack of oversight; you’d have to work really hard to not see the problems in his publications. But, yeah, there’s lots of much worse stuff going on that we read about in the newspaper every day.

On the border between credulity and postmodernism: The case of the UFO’s-as-space-aliens media insiders

I came across this post from Tyler Cowen:

From an email I [Cowen] sent to a well-known public intellectual:

I think the chance that the bodies turn out to be real aliens is quite low.

But the footage seems pretty convincing, a way for other people to see what…sources have been telling me for years. [Everyone needs to stop complaining that there are no photos!]

And to think it is a) the Chinese, b) USG secret project, or…whatever…*in Mexico* strains the imagination.

It is interesting of course how the media is not so keen to report on this. They don’t have to talk about the aliens, they could just run a story “The Mexican government has gone insane.” But they won’t do that, and so you should update your mental model of the media a bit in the “they are actually pretty conservative, in the literal sense of that term, and quite readily can act like a deer frozen in the headlights, though at some point they may lurch forward with something ill-conceived.”

Many of you readers are from Christian societies, or you are Christian. But please do not focus on the bodies! I know you are from your early upbringing “trained” to do so, even if you are a non-believer. Wait until that evidence is truly verified (and I suspect it will not be). Focus on the video footage.

In any case, the Mexican revelations [sic] mean this issue is not going away, and perhaps this will force the hand of the USG to say more than they otherwise would have.

The above-linked post seems ridiculous to me, while comments on the post are much more reasonable—I guess it’s not hard to be reasonable when all you have to do is laugh at a silly hoax.

From a straight-up econ point of view I guess it makes sense that there has been a continuing supply of purported evidence for space aliens: there’s a big demand for this sort of thing so people will create some supply. It’s disappointing to me to see someone as usually-savvy as Cowen falling for this sort of thing, but (a) there’s some selection bias, as I’m not writing about all the people out there who have not been snookered by this Bermuda triangle ancient astronauts Noah’s ark fairies haunted radios bigfoot ESP ghosts space aliens stuff.

Given my earlier post on news media insiders getting all excited about UFOs (also this), you won’t be surprised to hear that I’m annoyed by Cowen’s latest. It’s just so ridiculous! Amusingly, his phrasing, “I think the chance that the bodies turn out to be real aliens is quite low,” echoes that of fellow contrarian pundit Nate Silver, who wrote, “I’m not saying it’s aliens, it’s almost definitely not aliens.” Credit them for getting the probability on the right side of 50%, but . . . c’mon.

As I wrote in my earlier posts, what’s noteworthy is not that various prominent people think that UFO’s might be space aliens—as I never tire of saying in this context, 30% of Americans say they believe in ghosts, which have pretty much the same basis in reality—; rather, what’s interesting is that they feel so free to admit this belief. I attribute this to a sort of elite-media contagion: Ezra Klein and Tyler Cowen believe the space aliens thing is a possibility, they’re smart guys, so other journalists take it more seriously, etc. Those of us outside the bubble can just laugh, but someone like Nate Silver is too much of an insider and is subject to the gravitational pull of elite media, twitter, etc.

Mark Palko offers a slightly different take, attributing the latest burst of elite credulity to the aftereffects of a true believer who managed to place a few space-aliens-curious stories into the New York Times, which then gave the story an air of legitimacy etc.

The space aliens thing is interesting in part because it does not seem strongly connected to political polarization. You’ve got Cowen on the right, Klein on the left, and Silver on the center-left. OK, just three data points, but still. Meanwhile, Cowen gets a lot of far-right commenters, but most of the commenters to his recent post are with me on this one, just kind of baffled that he’s pushing the story.

Postmodernism

A couple days after seeing Cowen’s post, I happened to be reading a book that discussed postmodernism in the writing of history. I don’t care so much about postmodernism, but the book was interesting; I’ll discuss it in a future post.

In any case, here’s the connection I saw.

Postmodernism means different things to different people, but one of its key tenets is that there is no objective truth . . . uhhhh, let me just “do a wegman” here and quote wikipedia:

Postmodernism is an intellectual stance or mode of discourse which challenges worldviews associated with Enlightenment rationality dating back to the 17th century. Postmodernism is associated with relativism and a focus on the role of ideology in the maintenance of economic and political power. Postmodernists are “skeptical of explanations which claim to be valid for all groups, cultures, traditions, or races, and instead focuses on the relative truths of each person”. It considers “reality” to be a mental construct. Postmodernism rejects the possibility of unmediated reality or objectively-rational knowledge, asserting that all interpretations are contingent on the perspective from which they are made; claims to objective fact are dismissed as naive realism.

One thing that struck me about Cowen’s post was not just that he’s sympathetic to the space-aliens hypothesis; also it seems to bug him that the elite news media isn’t covering it more widely. Which is funny, because it bugs me that the media (including Bloomberg columnist Cowen) are taking it as seriously as they do!

Cowen writes, “It is interesting of course how the media is not so keen to report on this.” Doesn’t seem so interesting to me! My take is that most people in the media have some common sense and also have some sense of the history of this sort of nexus of hoaxes and credibility, from Arthur Conan Doyle onward.

The postmodernism that I see coming from Cowen is in statement, “the footage seems pretty convincing, a way for other people to see what . . . sources have been telling me for years,” which seems to me, as a traditional rationalist or non-postmodernist, to be a form of circular reasoning saying that something is real because people believe in it. Saying “this issue is not going away” . . . I mean, sure, astrology isn’t going away either! Unfortunately, just about nothing ever seems to go away.

Oppositionism

There’s something else going on here, it’s hard for me to put my finger on, exactly . . . something about belief in the occult as being oppositional, something “they” don’t want you do know about, whether “they” is “the media” or “the government” or “organized religion” or “the patriarchy” or “the medical establishment” or whatever. As we discussed in an earlier post on a topic, one interesting thing is how things happen that push certain fringe beliefs into a zone where it’s considered legitimate to take them seriously. As a student of public opinion and politics, I’m interested not just in who has these beliefs and why, but also in the processes by which some such beliefs but not others circulate so that they seem perfectly normal to various people such as Cowen, Silver, etc., in the elite news media bubble.

When Steve Bannon meets the Center for Open Science: Bad science and bad reporting combine to yield another ovulation/voting disaster

The Kangaroo with a feather effect

A couple of faithful correspondents pointed me to this recent article, “Fertility Fails to Predict Voter Preference for the 2020 Election: A Pre-Registered Replication of Navarrete et al. (2010).”

It’s similar to other studies of ovulation and voting that we’ve criticized in the past (see for example pages 638-640 of this paper.

A few years ago I ran across the following recommendation for replication:

One way to put a stop to all this uncertainty: preregistration of studies of all kinds. It won’t quell existing worries, but it will help to prevent new ones, and eventually the truth will out.

My reaction was that this was way too optimistic.The ovulation-and-voting study had large measurement error, high levels of variation, and any underlying effects were small. And all this is made even worse because they were studying within-person effects using a between-person design. So any statistically significant difference they find is likely to be in the wrong direction and is essentially certain to be a huge overestimate. That is, the design has a high Type S error rate and a high Type M error rate.

And, indeed, that’s what happened with the replication. It was a between-person comparison (that is, each person was surveyed at only one time point), there was no direct measurement of fertility, and this new study was powered to only be able to detect effects that were much larger than would be scientifically plausible.

The result: a pile of noise.

To the authors’ credit, their title leads right off with “Fertility Fails to Predict . . .” OK, not quite right, as they didn’t actually measure fertility, but at least they foregrounded their negative finding.

Bad Science

Is it fair for me to call this “bad science”? I think this description is fair. Let me emphasize that I’m not saying the authors of this study are bad people. Remember our principle that honesty and transparency are not enough. You can be of pure heart, but if you are studying a small and highly variable effect using a noisy design and crude measurement tools, you’re not going to learn anything useful. You might as well just be flipping coins or trying to find patterns in a table of random numbers. And that’s what’s going on here.

Indeed, this is one of the things that’s bothered me for years about preregistered replications. I love the idea of preregistration, and I love the idea of replication. These are useful tools for strengthening research that is potentially good research and for providing some perspective on questionable research that’s been done in the past. Even the mere prospect of preregistered replication can be a helpful conceptual tool when considering an existing literature or potential new studies.

But . . . if you take a hopelessly noisy design and preregister it, that doesn’t make it a good study. Put a pile of junk in a fancy suit and it’s still a pile of junk.

In some settings, I fear that “replication” is serving a shiny object to distract people from the central issues of measurement, and I think that’s what’s going on here. The authors of this study were working with some vague ideas of evolutionary psychology, and they seem to be working under the assumption that, if you’re interested in theory X, that the way to science is to gather some data that have some indirect connection to X and then compute some statistical analysis in order to make an up-or-down decision (“statistically significant / not significant” or “replicated / not replicated”).

Again, that’s not enuf! Science isn’t just about theory, data, analysis, and conclusions. It’s also about measurement. It’s quantitative. And some measurements and designs are just too noisy to be useful.

As we wrote a few years ago,

My criticism of the ovulation-and-voting study is ultimately quantitative. Their effect size is tiny and their measurement error is huge. My best analogy is that they are trying to use a bathroom scale to weigh a feather—and the feather is resting loosely in the pouch of a kangaroo that is vigorously jumping up and down.

At some point, a set of measurements is so noisy that biases in selection and interpretation overwhelm any signal and, indeed, nothing useful can be learned from them. I assume that the underlying effect size in this case is not zero—if we were to look carefully, we would find some differences in political attitude at different times of the month for women, also different days of the week for men and for women, and different hours of the day, and I expect all these differences would interact with everything—not just marital status but also age, education, political attitudes, number of children, size of tax bill, etc etc. There’s an endless number of small effects, positive and negative, bubbling around.

Bad Reporting

Bad science is compounded by bad reporting. Someone pointed me to a website called “The National Pulse,” which labels itself as “radically independent” but seems to be an organ of the Trump wing of the Republican party, and which featured this story, which they seem to have picked up from the notorious sensationalist site, The Daily Mail:

STUDY: Women More Likely to Vote Trump During Most Fertile Point of Menstrual Cycle.

A new scientific study indicates women are more likely to vote for former President Donald Trump during the most fertile period of their menstrual cycle. According to researchers from the New School for Social Research, led by psychologist Jessica L Engelbrecht, women, when at their most fertile, are drawn to the former President’s intelligence in comparison to his political opponents. The research occurred between July and August 2020, observing 549 women to identify changes in their political opinions over time. . . .

A significant correlation was noticed between women at their most fertile and expressing positive opinions towards former President Donald Trump. . . . the 2020 study indicated that women, while ovulating, were drawn to former President Trump because of his high degree of intelligence, not physical attractiveness. . . .

As I wrote above, I think that research study was bad, but, conditional on the bad design and measurement, its authors seem to have reported it honestly.

The news report adds new levels of distortion.

– The report states that the study observed women “to identify changes in their political opinions over time.” First, the study didn’t “observe” anyone; they conducted an online survey. Second, they didn’t identify any changes over time: the women in the study were surveyed only once!

– The report says something about “a significant correlation” and that “the study indicated that . . .” This surprised me, given that the paper itself was titled, “Fertility Fails to Predict Voter Preference for the 2020 Election.” How do you get from “fails to predict” to “a significant correlation”? I looked at the journal article and found the relevant bit:

Results of this analysis for all 14 matchups appear in Table 2. In contrast to the original study’s findings, only in the Trump-Obama matchup was there a significant relationship between conception risk and voting preference [r_pb (475) = −.106, p = .021] such that the probability of intending to vote for Donald J. Trump rose with conception risk.

Got it? They looked at 14 comparisons. Out of these, one of these was “statistically significant” at the 5% level. This is the kind of thing you’d expect to see from pure noise, or the mathematical equivalent, which is a study with noisy measurements of small and variable effects. The authors write, “however, it is possible that this is a Type I error, as it was the only significant result across the matchups we analyzed,” which I think is still too credulous a way to put it; a more accurate summary would be to say that the data are consistent with null effects, which is no surprise given the realistic possible sizes of any effects in this very underpowered study.

The authors of the journal article also write, “Several factors may account for the discrepancy between our [lack of replication of] the original results.” They go on for six paragraphs giving possible theories—but never once considering the possibility that the original studies and theirs were just too noisy to learn anything useful.

Look. I don’t mind a bit of storytelling: why not? Storytelling is fun, and it can be a good way to think about scientific hypotheses and their implications. The reason we do social science is because we’re interested in the social world; we’re not just number crunchers. So I don’t mind that the authors had several paragraphs with stories. The problem is not that they’re telling stories, it’s that they’re only telling stories. They don’t ever reflect that this entire literature is chasing patterns in noise.

And this lack of reflection about measurement and effect size is destroying them! They went to all this trouble to replicate this old study, without ever grappling with that study’s fundamental flaw (see kangaroo picture at the top of this post). Again, I’m not saying that they authors are bad people or that they intend to mislead; they’re just doing bad, 2010-2015-era psychological science. They don’t know better, and they haven’t been well served by the academic psychology establishment which has promoted and continues to promote this sort of junk science.

Don’t blame the authors of the bad study for the terrible distorted reporting

Finally, it’s not the authors’ fault that their study was misreported by the Daily Mail and that Steve Bannon associated website. “Fails to Predict” is right there in the title of the journal article. If clickbait websites and political propagandists want to pull out that p = 0.02 result from your 14 comparisons and spin a tale around it, you can’t really stop them.

The Center for Open Science!

Science reform buffs will enjoy these final bits from the published paper:

Why we say that honesty and transparency are not enough:

Someone recently asked me some questions about my article from a few years ago, Honesty and transparency are not enough. I thought it might be helpful to summarize why I’ve been promoting this idea.

The central message in that paper is that reproducibility is great, but if a study is too noisy (with the bias and variance of measurements being large compared to any persistent underlying effects), that making it reproducible won’t solve those problems. I wrote it for three reasons:

(a) I felt that reproducibility (or, more generally, “honesty and transparency”) were being oversold, and I didn’t want researchers to think that just cos they drink the reproducibility elixir, that their studies will then be good. Reproducibility makes it harder to fool yourself and others, but it does not turn a hopelessly noisy study into good science.

(b) Lots of research are honest and transparent in their work but still do bad research. I wanted to be able to say that the research is bad without that implying that I think they are being dishonest.

(c) Conversely, I was concerned that, when researchers heard about problems with bad research by others, they would think that the people who are doing that bad research are cheating in some way. This leads to the problem of researchers saying to themselves, “I’m honest, I don’t ‘p-hack,’ so my research can’t be bad.” Actually, though, lots of people do research that’s honest, transparent, and useless! That’s one reason I prefer to speak of “forking paths” rather than “p-hacking”: it’s less of an accusation and more of a description.

Scientific publishers busily thwarting science (again)

This post is by Lizzie.

I am working with some colleagues on how statistical methods may affect citation counts. For this, we needed to find some published papers. So this colleague started downloading some. And their university quickly showed up with the following:

Yesterday we received three separate systematic downloading warnings from publishers Taylor & Francis, Wiley and UChicago associating the activity with [… your] office desktop computer’s address. As allowed in our licenses with those publishers, they have already blocked access from that IP address and have asked us to investigate.

Unfortunately, all of those publishers specifically prohibit systematic downloading for any purpose, including legitimate bibliometric or citation analysis.

Isn’t that great? I review for all of these companies for free, in rare cases I pay them to publish my papers, and then they use all that money to do this? Oh, and the university library signed a contract so now they pay someone to send these emails… that’s just great. I know we all know this is a depressing cabal, but this one surprised me.

In other news, this photo is from my (other) colleague’s office, where I am visiting for a couple days.

Hey! A new (to me) text message scam! Involving a barfing dog!

Last year Columbia changed our phone system so now we can accept text messages. This can be convenient, and sometimes people reach me that way.

But then the other day this text came in:

And, the next day:

Someone’s dog has been vomiting, and this person is calling from two different numbers—home and work, perhaps? That’s too bad! I hope they reach the real Dr. Ella before the dog gets too sick.

Then this:

And now I started getting suspicious. How exactly does someone get my phone as a wrong number for a veterinarian? I’ve had this work number for over 25 years! It could be that someone typed in a phone number wrong. But . . . how likely is it that two unrelated people (the owner of a sick dog and the seller of veterinary products) would mistype someone’s number in the exact same way on the exact same day?

Also, “Dr. Ella”? I get that people give their doctors nicknames like that, but in a message to the office they would use the doctor’s last name, no?

Meanwhile, these came in:

Lisa, Ella, whatever. Still it seemed like some kinda mixup, and I had no thought that it might be a scam until I came across this post from Max Read, “What’s the deal with all those weird wrong-number texts?”, which answered all my questions.

Apparently the veterinarian, the yachts, and all the rest, are just a pretext to get you involved in a conversation where the scammers then befriend you before stealing as much of your money as they can. Kinda mean, huh? Can’t they do something more socially beneficial, like do some politically incorrect p-hacking or something involving soup bowls or paper shredders? Or just plagiarize a book about giraffes?

Stabbers gonna stab — fraud edition

One of the themes of Dan Davies’s book, Lying for Money, was that fraudsters typically do their crimes over and over again, until they get caught. And then, when they are released from prison, they do it again. This related to something I noticed in the Theranos story, which was that the fraud was in open sight for many years and the fraudsters continued to operate in the open.

Also regarding that interesting overlap of science and business fraud, I noted:

There seem to have been two ingredients that allowed Theranos to work. And neither of these ingredients involved technology or medicine. No, the two things were:

1. Control of the narrative.

2. Powerful friends.

Neither of these came for free. Theranos’s leaders had to work hard, for long hours, for years and years, to maintain control of the story and to attract and maintain powerful friends. And they needed to be willing to lie.

The newest story

Ben Mathis-Lilley writes:

On Wednesday, the Department of Justice announced that it has arrested a 48-year-old Lakewood, New Jersey, man named Eliyahu “Eli” Weinstein on charges of operating, quote, “a Ponzi scheme.” . . . How did authorities know that Weinstein was operating a Ponzi scheme? For one thing, he allegedly told associates, while being secretly recorded, that he had “Ponzied” the money they were using to repay investors. . . . Weinstein is further said to have admitted while being recorded that he had hidden assets from federal prosecutors. (“I hid money,” he is said to have told his conspirators, warning them that they would “go to jail” if anyone else found out.) . . .

These stories of “least competent criminals” are always fun, especially when the crime is nonviolent so you don’t have to think too hard about the victims.

What brings this one to the next level is the extreme repeat-offender nature of the criminal:

There was also one particular element of Weinstein’s background that may have alerted the DOJ that he was someone to keep an eye on—namely, that he had just been released from prison after serving eight years of a 24-year sentence for operating Ponzi schemes. More specifically, Weinstein was sentenced to prison for operating a Ponzi scheme involving pretend real estate transactions, then given a subsequent additional sentence for operating a second Ponzi scheme, involving pretend Facebook stock purchases, that he conducted after being released from custody while awaiting trial on the original charges.

Kinda like when a speeding driver runs over some kid and then it turns out the driver had 842 speeding tickets and the cops had never taken away his car, except in this case there’s no dead kid and the perp had already received a 24-year prison sentence.

How is it that he got out after serving only 8 years, anyway?

In January 2021, Weinstein was granted clemency by President Donald Trump at the recommendation of, among others, “the lawyer Alan Dershowitz,” who has frequently been the subject of news coverage in recent years for his work representing Trump and his relationship with the late Jeffrey Epstein.

Ahhhhh.

This all connects to my items #1 and 2 above.

The way Weinstein succeeded (to the extent he could be considered a success) at fraud was control of the narrative. And he got his get-out-of-jail-free card from his powerful friends. “Finding your roots,” indeed.

Stabbers gonna stab

This all reminded me of a story that came out in the newspaper a few decades ago. Jack Henry Abbott was a convicted killer who published a book while in prison. Abbott’s book was supposed to be very good, and he was subsequently released on parole with the support of various literary celebrities including Norman Mailer. Shortly after his release, Abbott murdered someone else and returned to prison, where he spent the rest of his life.

The whole story was very sad, but what made it particularly bizarre was that Abbott’s first murder was a stabbing, his second murder was a stabbing, and his most prominent supporter, Mailer, was notorious for . . . stabbing someone.

Here’s some academic advice for you: Never put your name on a paper you haven’t read.

Success has many fathers, but failure is an orphan.

Jonathan Falk points to this news article by Tom Bartlett which has this hilarious bit:

What at first had appeared to be a landmark study . . . seemed more like an embarrassment . . .

[The second-to-last author of the paper,] Armando Solar-Lezama, a professor in the electrical-engineering and computer-science department at MIT and associate director of the university’s computer-science and artificial-intelligence laboratory, says he didn’t realize that the paper was going to be posted as a preprint. . . .

The driving force behind the paper, according to Solar-Lezama and other co-authors, was Iddo Drori, [the last author of the paper and] an associate professor of the practice of computer science at Boston University. . . . The two usually met once a week or so. . . .

Solar-Lezama says he was unaware of the sentence in the abstract that claimed ChatGPT could master MIT’s courses. “There was sloppy methodology that went into making a wild research claim,” he says. While he says he never signed off on the paper being posted, Drori insisted when they later spoke about the situation that Solar-Lezama had, in fact, signed off. . . .

Solar-Lezama and two other MIT professors who were co-authors on the paper put out a statement insisting that they hadn’t approved the paper’s posting . . . Drori didn’t agree to an interview for this story, but he did email a 500-word statement providing a timeline of how and when he says the paper was prepared and posted online. In that statement, Drori writes that “we all took active part in preparing and editing the paper” . . . The revised version doesn’t appear to be available online and the original version has been withdrawn. . . .

This reminds me of a piece of advice that someone once gave me: Never put your name on a paper you haven’t read.

Those annoying people-are-stupid narratives in journalism

Palko writes:

Journalists love people-are-stupid narratives, but, while I believe cognitive dissonance is real, I think the lesson here is not “To an enthusiastically trusting public, his failure only made his gifts seem more real” and is instead that we should all be more skeptical of simplistic and overused pop psychology.

It’s easier for me to just give the link above than to explain all the background. The story is interesting on its own, but here I just wanted to highlight this point that Palko makes. Yes, people can be stupid, but it’s frustrating to see journalists take a story of a lawsuit-slinging celebrity and try to twist it into a conventional pop-psychology narrative.

I love this paper but it’s barely been noticed.

Econ Journal Watch asked me and some others to contribute to an article, “What are your most underappreciated works?,” where each of us wrote 200 words or less about an article of ours that had received few citations.

Here’s what I wrote:

What happens when you drop a rock into a pond and it produces no ripples?

My 2004 article, Treatment Effects in Before-After Data, has only 23 citations and this goes down to 16 after removing duplicates and citations from me. But it’s one of my favorite papers. What happened?

It is standard practice to fit regressions using an indicator variable for treatment or control; the coefficient represents the causal effect, which can be elaborated using interactions. My article from 2004 argues that this default class of models is fundamentally flawed in considering treatment and control conditions symmetrically. To the extent that a treatment “does something” and the control “leaves you alone,” we should expect before-after correlation to be higher in the control group than in the treatment group. But this is not implied by the usual models.

My article presents three empirical examples from political science and policy analysis demonstrating the point. The article also proposes some statistical models. Unfortunately, these models are complicated and can be noisy to fit with small datasets. It would help to have robust tools for fitting them, along with evidence from theory or simulation of improved statistical properties. I still hope to do such work in the future, in which case perhaps this work will have the influence I hope it deserves.

Here’s the whole collection. The other contributors were Robert Kaestner, Robert A. Lawson, George Selgin, Ilya Somin, and Alexander Tabarrok.

My contribution got edited! I prefer my original version shown above; if you’re curious about the edited version, just follow the link and you can compare for yourself.

Others of my barely-noticed articles

Most of my published articles have very few citations; it’s your usual Zipf or long-tailed thing. Some of those have narrow appeal and so, even if I personally like the work, it is understandable that they haven’t been cited much. For example, “Bayesian hierarchical classes analysis” (16 citations) took a lot of effort on our part and appeared in a good journal, but ultimately it’s on a topic that not many researchers are interested in. For another example, I enjoyed writing “Teaching Bayes to Graduate Students in Political Science, Sociology, Public Health, Education, Economics, . . .” (17 citations) and I think if it reached the right audience of educators it could have a real influence, but it’s not the kind of paper that gets built upon or cited very often. A couple of my ethics and statistics papers from my Chance column only have 14 citations each; no surprise given that nobody reads Chance. At one point I was thinking of collecting them into a book, as this could get more notice.

Some papers are great but only take you part of the way there. I really like my morphing paper with Cavan and Phil, “Using image and curve registration for measuring the goodness of fit of spatial and temporal predictions” (12 citations) and, again, it appeared in a solid journal, but it was more of a start than a finish to a research project. We didn’t follow it up, and it seems that nobody else did either.

Sometimes we go to the trouble of writing a paper and going through the review process, but then it gets so little notice that I ask myself in retrospect, why did we bother? For example, “Objective Randomised Blinded Investigation With Optimal Medical Therapy of Angioplasty in Stable Angina (ORBITA) and coronary stents: A case study in the analysis and reporting of clinical trials” has been cited only 5 times since its publication in 2019—and three of those citations were from me. It seems safe to say that this particular dropped rock produced few ripples.

What happened? That paper had a good statistical message and a good applied story, but we didn’t frame it in a general-enough way. Or . . . it wasn’t quite that, exactly. It’s not a problem of framing so much as of context.

Here’s what would’ve made the ORBITA paper work, in the sense of being impactful (i.e., useful): either a substantive recommendation regarding heart stents or a general recommendation (a “method”) regarding summarizing and reporting clinical studies. We didn’t have either of these. Rather than just getting the paper published, we should’ve done the hard work to more forward in one of those two directions. Or, maybe our strategy was ok if we can use this example in some future article. The article presented a great self-contained story that could be part of larger recommendations. But the story on its own didn’t have impact.

This is a good reminder that what typically makes a paper useful is if it can get used by people. A starting point is the title. We should figure out who might find the contents of the article useful and design the title from there.

Or, for another example, consider “Extension of the Isobolographic Approach to Interactions Studies Between More than Two Drugs: Illustration with the Convulsant Interaction between Pefloxacin, Norfloxacin, and Theophylline in Rats” (5 citations). I don’t remember this one at all, and maybe it doesn’t deserve to be read—but if it does, maybe it should’ve be more focused on the general approach so it could’ve been more directly useful to people working in that field.

“Information, incentives, and goals in election forecasts” (21 citations). I don’t know what to say about this one. I like the article, it’s on a topic that lots of people care about, the title seems fine, but not much impact. Maybe more people will look at it in 2024? “Accounting for uncertainty during a pandemic” is another one with only 21 citations. For that one, maybe people are just sick of reading about the goddam pandemic. I dunno; I think uncertainty is an important topic.

The other issue with citations is that people have to find your paper before they would consider citing it. I guess that many people in the target audiences for our articles never even knew they existed. From that perspective, it’s impressive that anything new ever gets cited at all.

Here’s an example of a good title: “A simple explanation for declining temperature sensitivity with warming.” Only 25 citations so far, but I have some hopes for this one: the title really nails the message, so once enough people happen to come across this article one way or another, I think they’ll read it and get the point, and this will eventually show up in citations.

“Tables as graphs: The Ramanujan principle” (4 citations). OK, I love this paper too, but realistically it’s not useful to anyone! So, fair enough. Similarly with “‘How many zombies do you know?’ Using indirect survey methods to measure alien attacks and outbreaks of the undead” (6 citations). An inspired, hilarious effort in my opinion, truly a modern classic, but there’s no real reason for anyone to actually cite it.

“Should we take measurements at an intermediate design point?” (3 citations). This is the one that really bugs me. Crisp title, clean example, innovative ideas . . . it’s got it all. But it’s sunk nearly without a trace. I think the only thing to do here is to pursue the researcher further, get new results, and publish those. Maybe also set up the procedure more explicitly as a method, rather than just the solution to a particular applied problem.

Torment executioners in Reno, Nevada, keep tormenting us with their publications.

The above figures come from this article which is listed on this Orcid page (with further background here):

Horrifying as all this is, at least from the standpoint of students and faculty at the University of Nevada, not to mention the taxpayers of that state, I actually want to look into a different bizarre corner of the story.

Let me point you to a quote from a recent article in Retraction Watch:

The current editor-in-chief [of the journal that featured the above two images, among with lots lots more] . . . published a statement about the criticism on the journal’s website, where he took full responsibility for the journal’s shortcomings. “While you can argue on the merits, quality, or impact of the work it is all original and we vehemently disagree with anyone who says otherwise,” he wrote.

I don’t think that claim is true. In particular, I don’t think it’s correct to state, vehemently or otherwise, that the work published in that journal is “all original.” I say this on the evidence of this paragraph from the this article that appeared there, an article we associate with the phrase, “torment executioners“:

It appears that the original source of this material was an article that had appeared the year before in an obscure and perhaps iffy outlet called The Professional Medical Journal. From the abstract of the paper in that journal:

The scary thing is that if you google the current editor of the journal where the apparent bit of incompetent plagiarism was published, you’ll see that this is first listed publication:

Just in case you were wondering: no, “Cambridge Scholars Publishing” is not the same as Cambridge University Press.

Kinda creepy that someone who “vehemently” makes a false statement about plagiarism published in his own journal has published a book on “Guidelines for academic researchers.”

We seem to have entered a funhouse-mirror version of academia with entire journals and subfields of fake articles, advisers training new students to enter fake academic careers, and, in a Gresham’s law sort of way, crowding out legitimate teaching and researchers.

Not written by a chatbot

The published article from the above-discussed journal that got this all “torment executioners”started was called “Using Science to Minimize Sleep Deprivation that May Reduce Train Accidents.” It’s two paragraphs long, includes a mislabeled figure that was a stock image of a fly, and has no content.

I pointed that article to a colleague who asked whether it was written by ChatGPT. I said, no, I didn’t think so because it was too badly written to be by a chatbot. I was not joking! Chatbot text is coherent at some level, often following something like the format of the standard five-paragraph high school essay, while this article did not make any sense at all. I think it’s more likely that it was a really bad student paper, maybe something written in desperation in the dwindling hours before the assignment was due, and then they published it in this fake journal. On the other hand, it was published in 2022, and chatbots were not so good back in 2022, so maybe it really is the product of an incompetent chatbot. Or maybe it was put together from plagiarized material, as in the “torment executioners” paper, and we just don’t have the original source to demonstrate it. My guess remains that it was a human-constructed bit of nonsense, but I’m guessing that anyone who would do this sort of thing today would use a chatbot. So in that sense these articles are a precious artifact of the past.

Back to the torment executioners

That apparently plagiarized article was still bugging me. One weird part of the story is that even the originally-published study seems a bit off, with statements such as “42% dentist preferred both standing and sitting position.” Maybe the authors of the “torment executioners” paper purposely picked something from a very obscure source, under the belief that then nobody would catch the copying?

What the authors of the “torment executioners” paper seem to have done is to take material from the paper that had been published earlier in in a different journal and run it through a computer program that changed some of the words, perhaps to make it less easily caught by plagiarism detectors? Here’s the map of transformations:

"acquired" -> "procured"
"vision" -> "perception"
"incidence" -> "effect"
"involvement" -> "association"
"followed" -> "taken after"
"Majority of them" -> "The larger part of the dental practitioner"
"intensity of pain" -> "concentration of torment"

Ha! Now we’re getting somewhere. “Concentration of torment,” indeed.

OK, let’s continue:

"discomfort" -> "inconvenience"
"aching" -> "hurting"
"paracetamol" -> "drugs"
"pain killer" -> "torment executioners"

Bingo! We found it. It’s interesting that this last word was made plural in translation. This suggests that the computer program that did these word swaps also had some sort of grammar and usage checker, so as a side benefit it fixed a few errors in the writing of the original article. The result is to take an already difficult-to-read passage and make it nearly incomprehensible.

But we’re not yet done with this paragraph. We also see:

"agreed to the fact" -> "concurred to the truth"

This is a funny one, because “concurred” is a reasonable synonym for “agreed,” and “truth” is not a bad replacement for “fact,” but when you put it together you get “concurred to the truth,” which doesn’t work here at all.

And more:

"pain" -> "torment level"
"aggravates" -> "bothers"
"repetitive movements" -> "tedious developments"

Whoa! That makes no sense at all. A modern chatbot would do it much better, I guess.

Here are a few more fun ones, still from this same paragraph of Ferguson et al. (2019):

"Conclusions:" -> "To conclude"
"The present study" -> "the display consideration"

“Display consideration”? Huh?

"high prevalence" -> "tall predominance"

This reminded me of Lucius Shepard’s classic story, “Barnacle Bill the Spacer,” which featured a gang called the Strange Magnificence. Maybe the computer program was having some fun here!

"disorders" -> "disarrangement"
"dentist" -> "dental specialists"
"so there should be" -> "in this manner"
"preventing" -> "avoiding"
"delivered" -> "conveyed"
"during" -> "amid"
"undergraduate curriculum" -> "undergrad educational programs"
"should be programmed" -> "ought to be put up"
"explain" -> "clarify"
"prolonged" -> "drawn out"

Finally, “bed posture density” becomes “bed pose density.” I don’t know about this whole “bed posture” thing . . . maybe someone could call up the Dean of Engineering at the University of Nevada and find out what’s up with that.

The whole article is hilarious, not just that paragraph. It’s a fun game, to try to figure out the original source of phrases such as, “indigent body movements” (indigent = poor) and “There are some signs when it comes to musculoskeletal as well” (I confess to be baffled by this one), and, my personal favorite, “Several studies have shown that
overweight children are an actual thing.”

Whaddya say, president and provost of the University of Reno? Are you happy that your dean of engineering is running a journal that publishes a paper like that? “Overweight children are an actual thing.”

Oh, it’s ok, that paper was never read from beginning to end by anybody—authors included.

Actually, this sentence might be my absolute favorite:

Having consolation in their shoes, having vigor in their shoes, and having quality in their shoes come to play within the behavioral design of youthful and talented kids with respect to the footwear they select to wear.

“Having vigor in their shoes” . . . that’s what it’s all about!

There’s “confidential dental clinics”: I guess “confidential” is being used as a “synonym” for private. And this:

Dental practitioners and other wellbeing callings in fact cannot dodge inactive stances for an awfully long time.

Exactly what you’d expect to see in a legitimate journal of the International Supply Chain Technology Journal.

I think the authors of this article are well qualified to teach in the USC medical school. They just need to work in some crazy giraffe facts and they’ll be just fine.

With the existence of chatbots, there will never be a need for this sort of ham-fisted plagiarism. End of an era. Kinda makes me sad.

P.S. As always, we laugh only to avoid crying. I remain furious on behalf of the hardworking students and faculty at UNR, not to mention the taxpayers of the state of Nevada, who are paying for this sort of thing. The phrase “torment executioners” has entered the lexicon.

P.P.S. Regarding the figures at the top of the post: I’ve coauthored papers with students. That’s fine; it’s a way that students can learn. I’m not at all trying to mock the students who made those pictures, if indeed that’s who drew them. I am criticizing whoever thought it was a good idea to publish this, not to mention to include it on professional C.V.’s. As a teacher, when you work with students, you try to help them do their best; you don’t stick your name on their crude drawings, plagiarized work, etc., which can’t be anyone’s best. I feel bad for any students who got sucked into this endeavor and were told that this sort of thing is acceptable work.

P.P.P.S. It looks like there may be yet more plagiarism going on; see here.

P.P.P.P.S. Retraction Watch found more plagiarism, this time on a report for the National Science Foundation.