Preregistration is a floor, not a ceiling.

This comes up from time to time, for example someone sent me an email expressing a concern that preregistration stifles innovation: if Fleming had preregistered his study, he never would’ve noticed the penicillin mold, etc.

My response is that preregistration is a floor, not a ceiling. Preregistration is a list of things you plan to do, that’s all. Preregistration does not stop you from doing more. If Fleming had followed a pre-analysis protocol, that would’ve been fine: there would have been nothing stopping him from continuing to look at his bacterial cultures.

As I wrote in comments to my 2022 post, “What’s the difference between Derek Jeter and preregistration?” (which I just added to the lexicon), you don’t preregister “the” exact model specification; you preregister “an” exact model specification, and you’re always free to fit other models once you’ve seen the data.

It can be really valuable to preregister, to formulate hypotheses and simulate fake data before gathering any real data. To do this requires assumptions—it takes work!—and I think it’s work that’s well spent. And then, when the data arrive, do everything you’d planned to do, along with whatever else you want to do.

Planning ahead should not get in the way of creativity. It should enhance creativity because you can focus your data-analytic efforts on new ideas rather than having to first figure out what defensible default thing you’re supposed to do.

Aaaand, pixels are free, so here’s that 2002 post in full:
Continue reading

“Hot hand”: The controversy that shouldn’t be. And thinking more about what makes something into a controversy:

I was involved in a recent email discussion, leading to this summary:

There is no theoretical or empirical reason for the hot hand to be controversial. The only good reason for there being a controversy is that the mistaken paper by Gilovich et al. appeared first. At this point we should give Gilovich et al. credit for bringing up the hot hand as a subject of study and accept that they were wrong in their theory, empirics, and conclusions, and we can all move on. There is no shame in this for Gilovich et al. We all make mistakes, and what’s important is not the personalities but the research that leads to understanding, often through tortuous routes.

“No theoretical reason”: see discussion here, for example.

“No empirical reason”: see here and lots more in the recent literature.

“The only good reason . . . appeared first”: Beware the research incumbency rule.

More generally, what makes something a controversy? I’m not quite sure, but I think the news media play a big part. We talked about this recently in the context of the always-popular UFOs-as-space-aliens theory, which used to be considered a joke in polite company but now seems to have reached the level of controversy.

I don’t have anything systematic to say about all this right now, but the general topic seems very worthy of study.

Zotero now features retraction notices

David Singerman writes:

Like a lot of other humanities and social sciences people I use Zotero to keep track of citations, create bibliographies, and even take & store notes. I also am not alone in using it in teaching, making it a required tool for undergraduates in my classes so they learn to think about organizing their information early on. And it has sharing features too, so classes can create group bibliographies that they can keep using after the semester ends.

Anyway my desktop client for Zotero updated itself today and when it relaunched I had a big red banner informing me that an article in my library had been retracted! I didn’t recognize it at first, but eventually realized that was because it was an article one of my students had added to their group library for a project.

The developers did a good job of making the alert unmissable (i.e. not like a corrections notice in a journal), the full item page contains lots of information and helpful links about the retraction, and there’s a big red X next to the listing in my library. See attached screenshots.

The way they implemented it will also help the teaching component, since a student will get this alert too.

Singerman adds this P.S.:

This has reminded me that some time ago you posted something about David Byrne, and whatever you said, it made me think of David Byrne’s wonderful appearance on the Colbert Report.

What was amazing to me when I saw it was that it’s kind of like a battle between Byrne’s inherent weirdness and sincerity, and Colbert’s satirical right-wing bloviator character. Usually Colbert’s character was strong enough to defeat all comers, but . . . decide for yourself.

Refuted papers continue to be cited more than their failed replications: Can a new search engine be built that will fix this problem?

Paul von Hippel writes:

Stuart Buck noticed your recent post on A WestLaw for Science. This is something that Stuart and I started talking about last year, and Stuart, who trained as an attorney, believes it was first suggested by a law professor about 15 years ago.

Since the 19th century, the legal profession has had citation indices that do far more than count citations and match keywords. Resources like Shepard’s Citations—first printed in 1873 and now published online along with competing tools such as JustCite, KeyCite, BCite, and SmartCite—do not just find relevant cases and statutes; they show lawyers whether a case or statute is still “good law.” Legal citation indexes show lawyers which cases have been affirmed or cited approvingly, and which have been criticized, reversed, or overruled by later courts.

Although Shepard’s Citations inspired the first Science Citation Index in 1960, which in turn inspired tools like Google Scholar, today’s academic search engine still rely primarily on citation counts and keywords. As a result, many scientists are like lawyers who walk into the courtroom unaware that a case central to their argument has been overruled.

Kind of, but not quite. A key difference is that in the courtroom there is some reasonable chance that the opposing lawyer or the judge will notice that the key case has been overruled, so that your argument that hinges on that case will fail. You have a clear incentive to not rely on overruled cases. In science, however, there’s no opposing lawyer and no judge: you can build an entire career on studies that fail to replicate, and no problem at all, as long as you don’t pull any really ridiculous stunts.

Hippel continues:

Let me share a couple of relevant articles that we recently published.

One, titled “Is Psychological Science Self-Correcting?, reports that replication studies, whether successful or unsuccessful, rarely have much effect on citations to the studies being replicated. When a finding fails to replicate, most influential studies sail on, continuing to gather citations at a similar rate for years, as though the replication had never been tried. The issue is not limited to psychology and raises serious questions about how quickly the scientific community corrects itself, and whether replication studies are having the correcting influence that we would like them to have. I considered several possible reasons for the persistent influence on studies that failed to replicate, and concluded that academic search engines like Google Scholar may well be part of the problem, since they prioritize highly cited articles, replicable or not, perpetuating the influence of questionable findings.

The finding that replications don’t affect citations has itself replicated pretty well. A recent blog post by Bob Reed at the University of Canterbury, New Zealand, summarized five recent papers that showed more or less the same thing in psychology, economics, and Nature/Science publications.

In a second article, published just last week in Nature Human Behaviour, Stuart Buck and I suggest ways to Improve academic search engines to reduce scholars’ biases. We suggest that the next generation of academic search engines should do more than count citations, but should help scholars assess studies’ rigor and reliability. We also suggest that future engines should be transparent, responsive and open source.

This seems like a reasonable proposal. The good news is that it’s not necessary for their hypothetical new search engine to dominate or replace existing products. People can use Google Scholar to find the most cited papers and use this new thing to inform about rigor and reliability. A nudge in the right direction, you might say.

With journals, it’s all about the wedding, never about the marriage.

John “not Jaws” Williams writes:

Here is another example of how hard it is to get erroneous publications corrected, this time from the climatology literature, and how poorly peer review can work.

From the linked article, by Gavin Schmidt:

Back in March 2022, Nicola Scafetta published a short paper in Geophysical Research Letters (GRL) . . . We (me, Gareth Jones and John Kennedy) wrote a note up within a couple of days pointing out how wrongheaded the reasoning was and how the results did not stand up to scrutiny. . . .

After some back and forth on how exactly this would work (including updating the GRL website to accept comments), we reformatted our note as a comment, and submitted it formally on December 12, 2022. We were assured from the editor-in-chief and publications manager that this would be a ‘streamlined’ and ‘timely’ review process. With respect to our comment, that appeared to be the case: It was reviewed, received minor comments, was resubmitted, and accepted on January 28, 2023. But there it sat for 7 months! . . .

The issue was that the GRL editors wanted to have both the comment and a reply appear together. However, the reply had to pass peer review as well, and that seems to have been a bit of a bottleneck. But while the reply wasn’t being accepted, our comment sat in limbo. Indeed, the situation inadvertently gives the criticized author(s) an effective delaying tactic since, as long as a reply is promised but not delivered, the comment doesn’t see the light of day. . . .

All in all, it took 17 months, two separate processes, and dozens of emails, who knows how much internal deliberation, for an official comment to get into the journal pointing issues that were obvious immediately the paper came out. . . .

The odd thing about how long this has taken is that the substance of the comment was produced extremely quickly (a few days) because the errors in the original paper were both commonplace and easily demonstrated. The time, instead, has been entirely taken up by the process itself. . . .

Schmidt also asks a good question:

Why bother? . . . Why do we need to correct the scientific record in formal ways when we have abundant blogs, PubPeer, and social media, to get the message out?

His answer:

Since journals remain extremely reluctant to point to third party commentary on their published papers, going through the journals’ own process seems like it’s the only way to get a comment or criticism noticed by the people who are reading the original article.

Good point. I’m glad that there are people like Schmidt and his collaborators who go to the trouble to correct the public record. I do this from time to time, but mostly I don’t like the stress of dealing with the journals so I’ll just post things here.

My reaction

This story did not surprise me. I’ve heard it a million times, and it’s often happened to me, which is why I once wrote an article called It’s too hard to publish criticisms and obtain data for replication.

Journal editors mostly hate to go back and revise anything. They’re doing volunteer work, and they’re usually in it because they want to publish new and exciting work. Replications, corrections, etc., that’s all seen as boooooring.

With journals, it’s all about the wedding, never about the marriage.

Mindlessness in the interpretation of a study on mindlessness (and why you shouldn’t use the word “whom” in your dating profile)

This is a long post, so let me give you the tl;dr right away: Don’t use the word “whom” in your dating profile.

OK, now for the story. Fasten your seat belts, it’s going to be a bumpy night.

It all started with this message from Dmitri with subject line, “Man I hate to do this to you but …”, which continued:

How could I resist?

https://www.cnbc.com/2024/02/15/using-this-word-can-make-you-more-influential-harvard-study.html

I’m sorry, let me try again … I had to send this to you BECAUSE this is the kind of obvious shit you like to write about. I like how they didn’t even do their own crappy study they just resurrected one from the distant past.

OK, ok, you don’t need to shout about it!

Following the link we see this breathless press release NBC news story:

Using this 1 word more often can make you 50% more influential, says Harvard study

Sometimes, it takes a single word — like “because” — to change someone’s mind.

That’s according to Jonah Berger, a marketing professor at the Wharton School of the University of Pennsylvania who’s compiled a list of “magic words” that can change the way you communicate. Using the word “because” while trying to convince someone to do something has a compelling result, he tells CNBC Make It: More people will listen to you, and do what you want.

Berger points to a nearly 50-year-old study from Harvard University, wherein researchers sat in a university library and waited for someone to use the copy machine. Then, they walked up and asked to cut in front of the unknowing participant.

They phrased their request in three different ways:

“May I use the Xerox machine?”
“May I use the Xerox machine because I have to make copies?”
“May I use the Xerox machine because I’m in a rush?”
Both requests using “because” made the people already making copies more than 50% more likely to comply, researchers found. Even the second phrasing — which could be reinterpreted as “May I step in front of you to do the same exact thing you’re doing?” — was effective, because it indicated that the stranger asking for a favor was at least being considerate about it, the study suggested.

“Persuasion wasn’t driven by the reason itself,” Berger wrote in a book on the topic, “Magic Words,” which published last year. “It was driven by the power of the word.” . . .

Let’s look into this claim. The first thing I did was click to the study—full credit to CNBC Make It for providing the link—and here’s the data summary from the experiment:

If you look carefully and do some simple calculations, you’ll see that the percentage of participants who complied was 37.5% under treatment 1, 50% under treatment 2, and 62.5% under treatment 3. So, ok, not literally true that both requests using “because” made the people already making copies more than 50% more likely to comply: 0.50/0.375 = 1.33, and increase of 33% is not “more than 50%.” But, sure, it’s a positive result. There were 40 participants in each treatment, so the standard error is approximately 0.5/sqrt(40) = 0.08 for each of those averages. The key difference here is 0.50 – 0.375 = 0.125, that’s the difference between the compliance rates under the treatments “May I use the Xerox machine?” and “May I use the Xerox machine because I have to make copies?”, and this will have a standard error of approximately sqrt(2)*0.08 = 0.11.

The quick summary from this experiment: an observed difference in compliance rates of 12.5 percentage points, with a standard error of 11 percentage points. I don’t want to say “not statistically significant,” so let me just say that the estimate is highly uncertain, so I have no real reason to believe it will replicate.

But wait, you say: the paper was published. Presumably it has a statistically significant p-value somewhere, no? The answer is, yes, they have some “p < .05" results, just not of that particular comparison. Indeed, if you just look at the top rows of that table (Favor = small), then the difference is 0.93 - 0.60 = 0.33 with a standard error of sqrt(0.6*0.4/15 + 0.93*0.07/15) = 0.14, so that particular estimate is just more than two standard errors away from zero. Whew! But now we're getting into forking paths territory: - Noisy data - Small sample - Lots of possible comparisons - Any comparison that's statistically significant will necessarily be huge - Open-ended theoretical structure that could explain just about any result. I'm not saying the researchers were trying to anything wrong. But remember, honesty and transparency are not enuf. Such a study is just too noisy to be useful.

But, sure, back in the 1970s many psychology researchers not named Meehl weren’t aware of these issues. They seem to have been under the impression that if you gather some data and find something statistically significant for which you could come up with a good story, that you’d discovered a general truth.

What’s less excusable is a journalist writing this in the year 2024. But it’s no surprise, conditional on the headline, “Using this 1 word more often can make you 50% more influential, says Harvard study.”

But what about that book by the University of Pennsylvania marketing professor? I searched online, and, fortunately for us, the bit about the Xerox machine is right there in the first chapter, in the excerpt we can read for free. Here it is:

He got it wrong, just like the journalist did! It’s not true that including the meaningless reason increased persuasion just as much as the valid reason did. Look at the data! The outcomes under the three treatment were 37.5%, 50%, and 62.5%. 50% – 37.5% ≠ 62.5% – 37.5%. Ummm, ok, he could’ve said something like, “Among a selected subset of the data with only 15 or 16 people in each treatment, including the meaningless reason increased persuasion just as much as the valid reason did.” But that doesn’t sound so impressive! Even if you add something like, “and it’s possible to come up with a plausible theory to go with this result.”

The book continues:

Given the flaws in the description of the copier study, I’m skeptical about these other claims.

But let me say this. If it is indeed true that using the word “whom” in online dating profiles makes you 31% more likely to get a date, then my advice is . . . don’t use the word “whom”! Think of it from a potential-outcomes perspective. Sure, you want to get a date. But do you really want to go on a date with someone who will only go out with you if you use the word “whom”?? That sounds like a really pretentious person, not a fun date at all!

OK, I haven’t read the rest of the book, and it’s possible that somewhere later on the author says something like, “OK, I was exaggerating a bit on page 4 . . .” I doubt it, but I guess it’s possible.

Replications, anyone?

To return to the topic at hand: In 1978 a study was conducted with 120 participants in a single location. The study was memorable enough to be featured in a business book nearly fifty years later.

Surely the finding has been replicated?

I’d imagine yes; on the other hand, if it had been replicated, this would’ve been mentioned in the book, right? So it’s hard to know.

I did a search, and the article does seem to have been influential:

It’s been cited 1514 times—that’s a lot! Google lists 55 citations in 2023 alone, and in what seem to be legit journals: Human Communication Research, Proceedings of the ACM, Journal of Retailing, Journal of Organizational Behavior, Journal of Applied Psychology, Human Resources Management Review, etc. Not core science journals, exactly, but actual applied fields, with unskeptical mentions such as:

What about replications? I searched on *langer blank chanowitz 1978 replication* and found this paper by Folkes (1985), which reports:

Four studies examined whether verbal behavior is mindful (cognitive) or mindless (automatic). All studies used the experimental paradigm developed by E. J. Langer et al. In Studies 1–3, experimenters approached Ss at copying machines and asked to use it first. Their requests varied in the amount and kind of information given. Study 1 (82 Ss) found less compliance when experimenters gave a controllable reason (“… because I don’t want to wait”) than an uncontrollable reason (“… because I feel really sick”). In Studies 2 and 3 (42 and 96 Ss, respectively) requests for controllable reasons elicited less compliance than requests used in the Langer et al study. Neither study replicated the results of Langer et al. Furthermore, the controllable condition’s lower compliance supports a cognitive approach to social interaction. In Study 4, 69 undergraduates were given instructions intended to increase cognitive processing of the requests, and the pattern of compliance indicated in-depth processing of the request. Results provide evidence for cognitive processing rather than mindlessness in social interaction.

So this study concludes that the result didn’t replicate at all! On the other hand, it’s only a “partial replication,” and indeed they do not use the same conditions and wording as in the original 1978 paper. I don’t know why not, except maybe that exact replications traditionally get no respect.

Langer et al. responded in that journal, writing:

We see nothing in her results [Folkes (1985)] that would lead us to change our position: People are sometimes mindful and sometimes not.

Here they’re referring to the table from the 1978 study, reproduced at the top of this post, which shows a large effect of the “because I have to make copies” treatment under the “Small Favor” condition but no effect under the “Large Favor” condition. Again, given the huge standard errors here, we can’t take any of this seriously, but if you just look at the percentages without considering the uncertainty, then, sure, that’s what they found. Thus, in their response to the partial replication study that did not reproduce their results, Langer et al. emphasized that their original finding was not a main effect but an interaction: “People are sometimes mindful and sometimes not.”

That’s fine. Psychology studies often measure interactions, as they should: the world is a highly variable place.

But, in that case, everyone’s been misinterpreting that 1978 paper! When I say “everybody,” I mean this recent book by the business school professor and also the continuing references to the paper in the recent literature.

Here’s the deal. The message that everyone seems to have learned, or believed they learned, from the 1978 paper is that meaningless explanations are as good as meaningful explanations. But, according to the authors of that paper when they responded to criticism in 1985, the true message is that this trick works sometimes and sometimes not. That’s a much weaker message.

Indeed the study at hand is too small to draw any reliable conclusions about any possible interaction here. The most direct estimate of the interaction effect from the above table is (0.93 – 0.60) – (0.24 – 0.24) = 0.33, with a standard error of sqrt(0.93*0.07/15 + 0.60*0.40/15 + 0.24*0.76/25 + 0.24*0.76/25) = 0.19. So, no, I don’t see much support for the claim in this post from Psychology Today:

So what does this all mean? When the stakes are low people will engage in automatic behavior. If your request is small, follow your request with the word “because” and give a reason—any reason. If the stakes are high, then there could be more resistance, but still not too much.

This happens a lot in unreplicable or unreplicated studies: a result is found under some narrow conditions, and then it is taken to have very general implications. This is just an unusual case where the authors themselves pointed out the issue. As they wrote in their 1985 article:

The larger concern is to understand how mindlessness works, determine its consequences, and specify better the conditions under which it is and is not likely to occur.

That’s a long way from the claim in that business book that “because” is a “magic word.”

Like a lot of magic, it only works under some conditions, and you can’t necessarily specify those conditions ahead of time. It works when it works.

There might be other replication studies of this copy machine study. I guess you couldn’t really do it now, because people don’t spend much time waiting at the copier. But the office copier was a thing for several decades. So maybe there are even some exact replications out there.

In searching for a replication, I did come across this post from 2009 by Mark Liberman that criticized yet another hyping of that 1978 study, this time from a paper by psychologist Daniel Kahenman in the American Economic Review. Kahneman wrote:

Ellen J. Langer et al. (1978) provided a well-known example of what she called “mindless behavior.” In her experiment, a confederate tried to cut in line at a copying machine, using various preset “excuses.” The conclusion was that statements that had the form of an unqualified request were rejected (e.g., “Excuse me, may I use the Xerox machine?”), but almost any statement that had the general form of an explanation was accepted, including “Excuse me, may I use the Xerox machine because I want to make copies?” The superficiality is striking.

As Liberman writes, this represented a “misunderstanding of the 1978 paper’s results, involving both a different conclusion and a strikingly overgeneralized picture of the observed effects.” Liberman performs an analysis of the data from that study which is similar to what I have done above.

Liberman summarizes:

The problem with Prof. Kahneman’s interpretation is not that he took the experiment at face value, ignoring possible flaws of design or interpretation. The problem is that he took a difference in the distribution of behaviors between one group of people and another, and turned it into generic statements about the behavior of people in specified circumstances, as if the behavior were uniform and invariant. The resulting generic statements make strikingly incorrect predictions even about the results of the experiment in question, much less about life in general.

Mindfulness

The key claim of all this research is that people are often mindless: they respond to the form of a request without paying attention to its context, with “because” acting as a “magic word.”

I would argue that this is exactly the sort of mindless behavior being exhibited by the people who are promoting that copying-machine experiment! They are taking various surface aspects of the study and using it to draw large, unsupported conclusions, without being mindful of the details.

In this case, the “magic words” are things like “p < .05," "randomized experiment," "Harvard," "peer review," and "Journal of Personality and Social Psychology" (this notwithstanding). The mindlessness comes from not looking into what exactly was in the paper being cited.

In conclusion . . .

So, yeah, thanks for nothing, Dmitri! Three hours of my life spent going down a rabbit hole. But, hey, if any readers who are single have read far enough down in the post to see my advice not to use “whom” in your data profile, it will all have been worth it.

Seriously, though, the “mindlessness” aspect of this story is interesting. The point here is not, Hey, a 50-year-old paper has some flaws! Or the no-less-surprising observation: Hey, a pop business book exaggerates! The part that fascinates me is that there’s all this shaky research that’s being taken as strong evidence that consumers are mindless—and the people hyping these claims are themselves demonstrating the point by mindlessly following signals without looking into the evidence.

The ultimate advice that the mindfulness gurus are giving is not necessarily so bad. For example, here’s the conclusion of that online article about the business book:

Listen to the specific words other people use, and craft a response that speaks their language. Doing so can help drive an agreement, solution or connection.

“Everything in language we might use over email at the office … [can] provide insight into who they are and what they’re going to do in the future,” says Berger.

That sounds ok. Just forget all the blather about the “magic words” and the “superpowers,” and forget the unsupported and implausible claim that “Arguments, requests and presentations aren’t any more or less convincing when they’re based on solid ideas.” As often is the case, I think these Ted-talk style recommendations would be on more solid ground if they were just presented as the product of common sense and accumulated wisdom, rather than leaning on some 50-year-old psychology study that just can’t bear the weight. But maybe you can’t get the airport book and the Ted talk without a claim of scientific backing.

Don’t get me wrong here. I’m not attributing any malign motivations to any of the people involved in this story (except for Dmitri, I guess). I’m guessing they really believe all this. And I’m not using “mindless” as an insult. We’re all mindless sometimes—that’s the point of the Langer et al. (1978) study; it’s what Herbert Simon called “bounded rationality.” The trick is to recognize your areas of mindlessness. If you come to an area where you’re being mindless, don’t write a book about it! Even if you naively think you’ve discovered a new continent. As Mark Twain apparently never said, it ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.

The usual disclaimer

I’m not saying the claims made by Langer et al. (1978) are wrong. Maybe it’s true that, under conditions of mindlessness, all that matters is the “because” and any empty explanation will do; maybe the same results would show up in a preregistered replication. All I’m saying is that the noisy data that have been presented don’t provide any strong evidence in support of such claims, and that’s what bothers me about all those confident citations in the business literature.

P.S.

After writing the above post, I sent this response to Dmitri:

OK, I just spent 3 hours on this. I now have to figure out what to do with this after blogging it, because I think there are some important points here. Still, yeah, you did a bad thing by sending this to me. These are 3 hours I could’ve spent doing real work, or relaxing . . .

He replied:

I mean, yeah, that’s too bad for you, obviously. But … try to think about it from my point of view. I am more influential, I got you to work on this while I had a nice relaxing post-Valentine’s day sushi meal with my wife (much easier to get reservations on the 15th and the flowers are a lot cheaper), while you were toiling away on what is essentially my project. I’d say the magic words did their job.

Good point! He exploited my mindlessness. I responded:

Ok, I’ll quote you on that one too! (minus the V-day details).

I’m still chewing on your comment that you appreciate the Beatles for their innovation as much as for their songs. The idea that there are lots of songs of similar quality but not so much innovation, that’s interesting. The only thing is that I don’t know enough about music, even pop music, to have a mental map of where everything fits in. For example, I recently heard that Coldplay song, and it struck me that it was in the style of U2 . But I don’t really know if U2 was the originator of that soaring sound. I guess Pink Floyd is kinda soaring too, but not quite in the same way . . . etc etc … the whole thing was frustrating to me because I had no sense of whether I was entirely bullshitting or not.

So if you can spend 3 hours writing a post on the above topic, we’ll be even.

Dmitri replied:

I am proud of the whole “Valentine’s day on the 15th” trick, so you are welcome to include it. That’s one of our great innovations. After the first 15-20 Valentine’s days, you can just move the date a day later and it is much easier.

And, regarding the music, he wrote:

U2 definitely invented a sound, with the help of their producer Brian Eno.

It is a pretty safe bet that every truly successful musician is an innovator—once you know the sound it is easy enough to emulate. Beethoven, Charlie Parker, the Beatles, all the really important guys invented a forceful, effective new way of thinking about music.

U2 is great, but when I listened to an entire U2 song from beginning to end, it seemed so repetitive as to be unlistenable. I don’t feel that way about the Beatles or REM. But just about any music sounds better to me in the background, which I think is a sign of my musical ignorance and tone-deafness (for real, I’m bad at recognizing pitches) more than anything else. I guess the point is that you’re supposed to dance to it, not just sit there and listen.

Anyway, I warned Dmitri about what would happen if I post his Valentine’s Day trick:

I post this, then it will catch on, and it will no longer work . . . just warning ya! You’ll have to start doing Valentine’s Day on the 16th, then the 17th, . . .

To which Dmitri responded:

Yeah but if we stick with it, it will roll around and we will get back to February 14 while everyone else is celebrating Valentines Day on these weird wrong days!

I’ll leave him with the last word.

How to code and impute income in studies of opinion polls?

Nate Cohn asks:

What’s your preferred way to handle income in a regression when income categories are inconsistent across several combined survey datasets? Am I best off just handling this with multiple categorical variables? Can I safely create a continuous variable?

My reply:

I thought a lot about this issue when writing Red Sate Blue State. My preferred strategy is to use a variable that we could treat as continuous. For example when working with ANES data I was using income categories 1,2,3,4,5 which corresponded to income categories 1-16th percentile, 16-33rd, 34-66th, 67-95th, and 96-100th. If you have different surveys with different categories, you could use some somewhat consistent scaling, for example one survey you might code as 1,3,5,7 and another might be coded as 2,4,6,8. I expect that other people would disagree with this advice but this the sort of thing that I was doing. I’m not so much worried about the scale being imperfect or nonlinear. But if you have a non-monotonic relation, you’ll have to be more careful.

Cohn responds:

Two other thoughts for consideration:

— I am concerned about non-monotonicity. At least in this compilation of 2020 data, the Democrats do best among rich and poor, and sag in the middle. It seems even more extreme when we get into the highest/lowest income strata, ala ANES. I’m not sure this survives controls—it seems like there’s basically no income effect after controls—but I’m hesitant to squelch a possible non-monotonic effect that I haven’t ruled out.

—I’m also curious for your thoughts on a related case. Suppose that (a) dataset includes surveys that sometimes asked about income and sometimes did not ask about income, (b) we’re interested in many demographic covariates, besides income, and; (c) we’d otherwise clearly specify the interaction between income and the other variables. The missing income data creates several challenges. What should we do?

I can imagine some hacky solutions to the NA data problem outright removing observations (say, set all NA income to 1 and interact our continuous income variable with whether we have actual income data), but if we interact other variables with the NA income data there are lots of cases (say, MRP where the population strata specifies income for full pop, not in proportion to survey coverage) where we’d risk losing much of the power gleaned from other surveys about the other demographic covariates. What should we do here?

My quick recommendation is to fit a model with two stages, first predicting income given your other covariates, then predicting your outcome of interest (issue attitude, vote preference, whatever) given income and the other covariates. You can fit the two models simultaneously in one Stan program. I guess then you will want some continuous coding for income (could be something like sqrt(income) with income topcoded at $300K) along with a possibly non-monotonic model at the second level.

The four principles of Barnard College: Respect, empathy, kindness . . . and censorship?

A few months ago we had Uh oh Barnard . . .

And now there’s more:

Barnard is mandating that students remove any items affixed to room or suite doors by Feb. 28, after which point the college will begin removing any remaining items, Barnard College Dean Leslie Grinage announced in a Friday email to the Barnard community. . . .

“We know that you have been hearing often lately about our community rules and policies. And we know it may feel like a lot,” Grinage wrote. “The goal is to be as clear as possible about the guardrails, and, meeting the current moment, do what we can to support and foster the respect, empathy and kindness that must guide all of our behavior on campus.”

According to the student newspaper, here’s the full email from the Barnard dean:

Dear Residential Students,

The residential experience is an integral part of the Barnard education. Our small campus is a home away from home for most of you, and we rely on each other to help foster an environment where everyone feels welcome and safe. This is especially important in our residential spaces. We encourage debate and discussion and the free exchange of ideas, while upholding our commitment to treating one another with respect, consideration and kindness. In that spirit, I’m writing to remind you of the guardrails that guide our residential community — our Residential Life and Housing Student Guide.

While many decorations and fixtures on doors serve as a means of helpful communication amongst peers, we are also aware that some may have the unintended effect of isolating those who have different views and beliefs. So, we are asking everyone to remove any items affixed to your room and/or suite doors (e.g. dry-erase boards, decorations, messaging) by Wednesday, February 28 at noon; the College will remove any remaining items starting Thursday, February 29. The only permissible items on doors are official items placed by the College (e.g. resident name tags). (Those requesting an exemption for religious or other reasons should contact Residential Life and Housing by emailing [email protected].)

We know that you have been hearing often lately about our community rules and policies. And we know it may feel like a lot. The goal is to be as clear as possible about the guardrails, and, meeting the current moment, do what we can to support and foster the respect, empathy and kindness that must guide all of our behavior on campus.

The Residential Life and Housing team is always here to support you, and you should feel free to reach out to them with any questions you may have.

Please take care of yourselves and of each other. Together we can build an even stronger Barnard community.

Sincerely,

Leslie Grinage

Vice President for Campus Life and Student Experience and Dean of the College

The dean’s letter links to this Residential Life and Housing Student Guide, which I took a look at. It’s pretty reasonable, actually. All I saw regarding doors was this mild restriction:

While students are encouraged to personalize their living space, they may not alter the physical space of the room, drill or nail holes into any surface, or affix tapestries and similar decorations to the ceiling, light fixtures, or doorways. Painting any part of the living space or college-supplied furniture is also prohibited.

The only thing in the entire document that seemed objectionable was the no-sleeping-in-the-lounges policy, but I can’t imagine they would enforce that rule unless someone was really abusing the privilege. They’re not gonna send the campus police to wake up a napper.

So, yeah, they had a perfectly reasonable rulebook and then decided to mess it all up by not letting the students decorate their doors. So much for New York, center of free expression.

I assume what’s going on here is that Barnard wants to avoid the bad publicity that comes from clashes between groups of students with opposing political views. And now they’re getting bad publicity because they’re censoring students’ political expression.

The endgame seems to be to turn the college to some sort of centrally-controlled corporate office park. But that wouldn’t be fair. In a corporate office, they let you decorate your own cubicle, right?

Leap Day Special!

The above graph is from a few years ago but is particularly relevant today!

It’s funny that, in leap years, approximately 10% fewer babies are born on 29 Feb. I think it would be cool to have a Leap Day birthday. But I guess most people, not being nerds, would prefer the less-“weird” days before and after.

There’s lots of good stuff at the above link; I encourage you to read the whole thing.

In the years since, we’ve improved Stan so we can fit and improve the birthdays time series decomposition model using full Bayesian inference.

Here’s Aki’s birthday case study which has all the details. This will also be going into our Bayesian Workflow book.

A suggestion on how to improve the broader impacts statement requirement for AI/ML papers

This is Jessica. Recall that in 2020, NeurIPS added a requirement that authors include a statement of ethical aspects and future societal consequences extending to both positive and negative outcomes. Since then, requiring broader impact statements in machine learning papers has become a thing.

The 2024 NeurIPS call has not yet been released, but in 2023 authors were required to complete a checklist where they had to respond to the following: “If appropriate for the scope and focus of your paper, did you discuss potential negative societal impacts of your work?”, with either Y, N, or N/A with explanation as appropriate. More recently, ICML introduced a requirement that authors include impact statements in submitted papers: “a statement of the potential broader impact of their work, including its ethical aspects and future societal consequences. This statement should be in a separate section at the end of the paper (co-located with Acknowledgements, before References), and does not count toward the paper page limit.”

ICML provided authors who didn’t feel they had much to say the following boiler-plate text:

“This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.”  

but warned authors to “to think about whether there is content which does warrant further discussion, as this statement will be apparent if the paper is later flagged for ethics review.”

I find this slightly amusing in that it sounds like what I would expect authors to be thinking even without an impact statement: This work is like, so impactful, for society at large. It’s just like, really important, on so many levels. We’re out of space unfortunately, so we’ll have to leave it at that.\newline\newline\newline\newline Love, \newline\newline\newline\newline the authors \newline\newline\newline\newline

I have an idea that might increase the value of the exercises, both for authors and those advocating for the requirements: Have authors address potential impacts in the context of their discussion of related work *with references to relevant critical work*, rather than expecting them to write something based on their own knowledge and impressions (which is likely to be hard for many authors for reasons I discuss below).  In other words, treat the impact statement as another dimension of contextualizing one’s work against existing scholarship, rather than a free-form brainstorm.

Why do I think this could be an improvement?  Here’s what I see as the main challenges these measures run into (both my own thoughts and those discussed by others):  

  1. Lack of incentives for researchers to be forthright about possible negative implications of their work, and consequently a lack of depth in the statements they write. Having them instead find and cite existing critical work on ethical or societal impacts doesn’t completely reconcile this, but presumably the critical papers aren’t facing quite the same incentives to say only the minimum amount. I expect it is easier for the authors to refer to the kind of critiques that ethics experts think are helpful than it is for them to write such critical reflections themselves.
  2. Lack of transparency around how impacts statements factor into reviews of papers. Authors perceive reviewing around impacts statements as a black box, and have responded negatively to the idea that their paper could potentially get rejected for not sufficiently addressing broader impacts. But authors have existing expectations about the consequences for not citing some relevant piece of prior work.
  3. Doubts about whether AI/ML researchers are qualified to be reflecting on the broader impacts of their work. Relative to say, the humanities, or even areas of computer science that are closer to social science, like HCI, it seems pretty reasonable to assume that researchers submitting machine learning papers are less likely to gravitate to and be skilled at thinking about social and ethical problems, but skilled at thinking about technical problems. Social impacts of technology require different sensibilities and training to make progress on (though I think there are also technical components to these problems as well, which is why both sides are needed). Why not acknowledge this by encouraging the authors to first consult what has been said by experts in these areas, and add their two cents only if there are aspects of the possible impacts or steps to be taken to address them (e.g., algorithmic solutions) that they perceive to be unaddressed by existing scholarship? This would better acknowledge that just any old attempt to address ethics is not enough (consider, e.g., Gemini’s attempt not to stereotype, which was not an appropriate way to integrate ethical concerns into the tech). It would also potentially encourage more exchange between what currently can appear to be two very divided camps of researchers.
  4. Lack of established processes for reflecting on ethical implications in time to do something about them (e.g., choose a different research direction) in tech research. Related work is often one of the first sections to be written in my experience, so at least those authors who start working on their paper in advance of the deadline might have a better chance of acknowledging potential problems and adjusting their work in response. I’m less convinced that this will make much of a difference in many cases, but thinking about ethical implications early is part of the end goal of requiring broader impacts statements as far as I can tell, and my proposal seems more likely to help than hurt for that goal.

The above challenges are not purely coming from my imagination. I was involved in a couple survey papers led by Priyanka Nanayakkara on what authors said in NeurIPS broader impacts statements, and many contained fairly vacuous statements that might call out buzzwords like privacy or fairness but didn’t really engage with existing research. If we think it’s important to properly understand and address potential negative societal impacts of technology, which is the premise of requiring impacts statements to begin with, why expect a few sentences that authors may well be adding at the last minute to do this justice? (For further evidence that that is what’s happening in some cases, see e.g., this paper reporting on the experiences of authors writing statements). Presumably the target audience of the impact statements would benefit from actual scholarship on the societal implications over rushed and unsourced throwing around of ethical-sounding terms. And the authors would benefit from having to consult what those who are investing the time to think through potential negative consequences carefully have to say.

Some other positive byproducts of this might be that the published record does a better job of pointing awareness to where critical scholarship needs to be further developed (again, leading to more of a dialogue between the authors and the critics). This seems critical, as some of the societal implications of new ML contributions will require both ethicists and technologists to address. And those investing the time to think carefully about potential implications should see more engagement with their work among those building the tools.

I described this to Priyanka, who also read a draft of this post, and she pointed out that an implicit premise of the broader impact requirements is that the authors are uniquely positioned to comment on the potential harms of their work pre-deployment. I don’t think this is totally off base (since obviously the authors understand the work at a more detailed level than most critics), but to me it misses a big part of the problem: that of misaligned incentives and training (#1, #3 above). It seems contradictory to imply that these potential consequences are not obvious and require careful reflection AND that people who have not considered them before will be capable of doing a good job at articulating them.

At the end of the day, the above proposal is an attempt to turn an activity that I suspect currently feels “religious” for many authors into something they can apply their existing “secular” skills to. 

On the border between credulity and postmodernism: The case of the UFO’s-as-space-aliens media insiders

I came across this post from Tyler Cowen:

From an email I [Cowen] sent to a well-known public intellectual:

I think the chance that the bodies turn out to be real aliens is quite low.

But the footage seems pretty convincing, a way for other people to see what…sources have been telling me for years. [Everyone needs to stop complaining that there are no photos!]

And to think it is a) the Chinese, b) USG secret project, or…whatever…*in Mexico* strains the imagination.

It is interesting of course how the media is not so keen to report on this. They don’t have to talk about the aliens, they could just run a story “The Mexican government has gone insane.” But they won’t do that, and so you should update your mental model of the media a bit in the “they are actually pretty conservative, in the literal sense of that term, and quite readily can act like a deer frozen in the headlights, though at some point they may lurch forward with something ill-conceived.”

Many of you readers are from Christian societies, or you are Christian. But please do not focus on the bodies! I know you are from your early upbringing “trained” to do so, even if you are a non-believer. Wait until that evidence is truly verified (and I suspect it will not be). Focus on the video footage.

In any case, the Mexican revelations [sic] mean this issue is not going away, and perhaps this will force the hand of the USG to say more than they otherwise would have.

The above-linked post seems ridiculous to me, while comments on the post are much more reasonable—I guess it’s not hard to be reasonable when all you have to do is laugh at a silly hoax.

From a straight-up econ point of view I guess it makes sense that there has been a continuing supply of purported evidence for space aliens: there’s a big demand for this sort of thing so people will create some supply. It’s disappointing to me to see someone as usually-savvy as Cowen falling for this sort of thing, but (a) there’s some selection bias, as I’m not writing about all the people out there who have not been snookered by this Bermuda triangle ancient astronauts Noah’s ark fairies haunted radios bigfoot ESP ghosts space aliens stuff.

Given my earlier post on news media insiders getting all excited about UFOs (also this), you won’t be surprised to hear that I’m annoyed by Cowen’s latest. It’s just so ridiculous! Amusingly, his phrasing, “I think the chance that the bodies turn out to be real aliens is quite low,” echoes that of fellow contrarian pundit Nate Silver, who wrote, “I’m not saying it’s aliens, it’s almost definitely not aliens.” Credit them for getting the probability on the right side of 50%, but . . . c’mon.

As I wrote in my earlier posts, what’s noteworthy is not that various prominent people think that UFO’s might be space aliens—as I never tire of saying in this context, 30% of Americans say they believe in ghosts, which have pretty much the same basis in reality—; rather, what’s interesting is that they feel so free to admit this belief. I attribute this to a sort of elite-media contagion: Ezra Klein and Tyler Cowen believe the space aliens thing is a possibility, they’re smart guys, so other journalists take it more seriously, etc. Those of us outside the bubble can just laugh, but someone like Nate Silver is too much of an insider and is subject to the gravitational pull of elite media, twitter, etc.

Mark Palko offers a slightly different take, attributing the latest burst of elite credulity to the aftereffects of a true believer who managed to place a few space-aliens-curious stories into the New York Times, which then gave the story an air of legitimacy etc.

The space aliens thing is interesting in part because it does not seem strongly connected to political polarization. You’ve got Cowen on the right, Klein on the left, and Silver on the center-left. OK, just three data points, but still. Meanwhile, Cowen gets a lot of far-right commenters, but most of the commenters to his recent post are with me on this one, just kind of baffled that he’s pushing the story.

Postmodernism

A couple days after seeing Cowen’s post, I happened to be reading a book that discussed postmodernism in the writing of history. I don’t care so much about postmodernism, but the book was interesting; I’ll discuss it in a future post.

In any case, here’s the connection I saw.

Postmodernism means different things to different people, but one of its key tenets is that there is no objective truth . . . uhhhh, let me just “do a wegman” here and quote wikipedia:

Postmodernism is an intellectual stance or mode of discourse which challenges worldviews associated with Enlightenment rationality dating back to the 17th century. Postmodernism is associated with relativism and a focus on the role of ideology in the maintenance of economic and political power. Postmodernists are “skeptical of explanations which claim to be valid for all groups, cultures, traditions, or races, and instead focuses on the relative truths of each person”. It considers “reality” to be a mental construct. Postmodernism rejects the possibility of unmediated reality or objectively-rational knowledge, asserting that all interpretations are contingent on the perspective from which they are made; claims to objective fact are dismissed as naive realism.

One thing that struck me about Cowen’s post was not just that he’s sympathetic to the space-aliens hypothesis; also it seems to bug him that the elite news media isn’t covering it more widely. Which is funny, because it bugs me that the media (including Bloomberg columnist Cowen) are taking it as seriously as they do!

Cowen writes, “It is interesting of course how the media is not so keen to report on this.” Doesn’t seem so interesting to me! My take is that most people in the media have some common sense and also have some sense of the history of this sort of nexus of hoaxes and credibility, from Arthur Conan Doyle onward.

The postmodernism that I see coming from Cowen is in statement, “the footage seems pretty convincing, a way for other people to see what . . . sources have been telling me for years,” which seems to me, as a traditional rationalist or non-postmodernist, to be a form of circular reasoning saying that something is real because people believe in it. Saying “this issue is not going away” . . . I mean, sure, astrology isn’t going away either! Unfortunately, just about nothing ever seems to go away.

Oppositionism

There’s something else going on here, it’s hard for me to put my finger on, exactly . . . something about belief in the occult as being oppositional, something “they” don’t want you do know about, whether “they” is “the media” or “the government” or “organized religion” or “the patriarchy” or “the medical establishment” or whatever. As we discussed in an earlier post on a topic, one interesting thing is how things happen that push certain fringe beliefs into a zone where it’s considered legitimate to take them seriously. As a student of public opinion and politics, I’m interested not just in who has these beliefs and why, but also in the processes by which some such beliefs but not others circulate so that they seem perfectly normal to various people such as Cowen, Silver, etc., in the elite news media bubble.

“Science as Verified Trust”

Interesting post by Sean Manning:

There seems to be a lot of confusion about the role of trust in science or scholarship. Engineers such as Bill Nye and political propagandists throw around the phrase “trust the science”! On the other hand, the rationalists whom I mentioned last year brandish the Royal Society’s motto nullius in verba “Take nobody’s word for it” like a sword. I [Manning] think both sides are working from some misconceptions about how science or scholarship work. . . .

What makes this scientific or scholarly is not that you do every step yourself. It is that every step of the argument has been checked by multiple independent people, so in most cases you can quickly see if those people disagree and then trust those preliminary steps. Science or scholarship is not about heroes who know every skill, its about systems of questioning and verification which let us provisionally assume that some things are true while we focus on something where we are not sure of the answer. . . .

Why we say that honesty and transparency are not enough:

Someone recently asked me some questions about my article from a few years ago, Honesty and transparency are not enough. I thought it might be helpful to summarize why I’ve been promoting this idea.

The central message in that paper is that reproducibility is great, but if a study is too noisy (with the bias and variance of measurements being large compared to any persistent underlying effects), that making it reproducible won’t solve those problems. I wrote it for three reasons:

(a) I felt that reproducibility (or, more generally, “honesty and transparency”) were being oversold, and I didn’t want researchers to think that just cos they drink the reproducibility elixir, that their studies will then be good. Reproducibility makes it harder to fool yourself and others, but it does not turn a hopelessly noisy study into good science.

(b) Lots of research are honest and transparent in their work but still do bad research. I wanted to be able to say that the research is bad without that implying that I think they are being dishonest.

(c) Conversely, I was concerned that, when researchers heard about problems with bad research by others, they would think that the people who are doing that bad research are cheating in some way. This leads to the problem of researchers saying to themselves, “I’m honest, I don’t ‘p-hack,’ so my research can’t be bad.” Actually, though, lots of people do research that’s honest, transparent, and useless! That’s one reason I prefer to speak of “forking paths” rather than “p-hacking”: it’s less of an accusation and more of a description.

Stabbers gonna stab — fraud edition

One of the themes of Dan Davies’s book, Lying for Money, was that fraudsters typically do their crimes over and over again, until they get caught. And then, when they are released from prison, they do it again. This related to something I noticed in the Theranos story, which was that the fraud was in open sight for many years and the fraudsters continued to operate in the open.

Also regarding that interesting overlap of science and business fraud, I noted:

There seem to have been two ingredients that allowed Theranos to work. And neither of these ingredients involved technology or medicine. No, the two things were:

1. Control of the narrative.

2. Powerful friends.

Neither of these came for free. Theranos’s leaders had to work hard, for long hours, for years and years, to maintain control of the story and to attract and maintain powerful friends. And they needed to be willing to lie.

The newest story

Ben Mathis-Lilley writes:

On Wednesday, the Department of Justice announced that it has arrested a 48-year-old Lakewood, New Jersey, man named Eliyahu “Eli” Weinstein on charges of operating, quote, “a Ponzi scheme.” . . . How did authorities know that Weinstein was operating a Ponzi scheme? For one thing, he allegedly told associates, while being secretly recorded, that he had “Ponzied” the money they were using to repay investors. . . . Weinstein is further said to have admitted while being recorded that he had hidden assets from federal prosecutors. (“I hid money,” he is said to have told his conspirators, warning them that they would “go to jail” if anyone else found out.) . . .

These stories of “least competent criminals” are always fun, especially when the crime is nonviolent so you don’t have to think too hard about the victims.

What brings this one to the next level is the extreme repeat-offender nature of the criminal:

There was also one particular element of Weinstein’s background that may have alerted the DOJ that he was someone to keep an eye on—namely, that he had just been released from prison after serving eight years of a 24-year sentence for operating Ponzi schemes. More specifically, Weinstein was sentenced to prison for operating a Ponzi scheme involving pretend real estate transactions, then given a subsequent additional sentence for operating a second Ponzi scheme, involving pretend Facebook stock purchases, that he conducted after being released from custody while awaiting trial on the original charges.

Kinda like when a speeding driver runs over some kid and then it turns out the driver had 842 speeding tickets and the cops had never taken away his car, except in this case there’s no dead kid and the perp had already received a 24-year prison sentence.

How is it that he got out after serving only 8 years, anyway?

In January 2021, Weinstein was granted clemency by President Donald Trump at the recommendation of, among others, “the lawyer Alan Dershowitz,” who has frequently been the subject of news coverage in recent years for his work representing Trump and his relationship with the late Jeffrey Epstein.

Ahhhhh.

This all connects to my items #1 and 2 above.

The way Weinstein succeeded (to the extent he could be considered a success) at fraud was control of the narrative. And he got his get-out-of-jail-free card from his powerful friends. “Finding your roots,” indeed.

Stabbers gonna stab

This all reminded me of a story that came out in the newspaper a few decades ago. Jack Henry Abbott was a convicted killer who published a book while in prison. Abbott’s book was supposed to be very good, and he was subsequently released on parole with the support of various literary celebrities including Norman Mailer. Shortly after his release, Abbott murdered someone else and returned to prison, where he spent the rest of his life.

The whole story was very sad, but what made it particularly bizarre was that Abbott’s first murder was a stabbing, his second murder was a stabbing, and his most prominent supporter, Mailer, was notorious for . . . stabbing someone.

Lefty Driesell and Bobby Knight

This obit of the legendary Maryland basketball coach reminded me of a discussion we had a few years ago. It started with a remark in a published article by political scientist Diana Mutz identifying herself as “a Hoosier by birth and upbringing, the daughter of a former Republican officeholder, and someone who still owns a home in Mike Pence’s hometown.”

That’s interesting: I don’t know so many children of political officeholders! Actually, I can’t think of anyone I know, other than Mutz, who is a child of a political officeholder, but perhaps there are some such people in my social network. I don’t know the occupations of most of my friends’ parents.

Anyway, following up on that bit from Mutz, sociologist Steve Morgan added some background of his own:

I was also born in Indiana, and in fact my best friend in the 1st grade, before I left the state, was Pat Knight. To me, his father, Bobby Knight was a pleasant and generally kind man (who used to give us candy bars, etc.). He turned out to be a Trump supporter, and probably his son too. So, in addition to not appreciating his full basketball personality when I was 6 years old, I also did not see his potential to find a demagogue inspiring. We moved to Ohio, where I received a lot of education in swing-state politics and Midwestern resentment of coastal elites.

And then I threw in my two cents:

I was not born in Indiana, but I grew up in suburban Maryland (about 10 miles from Brett Kavanaugh, but I went to a public school in a different part of the county and so had zero social overlap with his group). One of the kids in my school was Chuck Driesell, son of Lefty Driesell, former basketball coach at the University of Maryland. Lefty is unfortunately now most famous for his association with Len Bias, but Chuck and I were in high school before that all happened, when Lefty was famous for being a good coach who couldn’t ever quite beat North Carolina. Once I remember the Terps decided to beat Dean Smith at his own game by doing the four corners offense themselves. But it didn’t work; I think Maryland ended up losing 21-18 or some other ping-pong-like score. Chuck was in my economics class. I have no idea if he’s now a Trump supporter. I guess it’s possible. One of the other kids in that econ class was an outspoken conservative, one of the few Reagan supporters of our group of friends back in 1980. Chuck grew up and became a basketball coach; the other kid grew up and became an economist.

I never went to a Maryland basketball game all the time I lived there, even when I was a student at the university. I wish I’d gone; I bet it would’ve been a lot of fun. My friends and I played some pickup soccer and basketball, and I watched lots of sports on TV, but for whatever reason we never even considered the idea of going to a game. We didn’t attend any of high school football games either, even though our school’s team was the state champions. This was not out of any matter of principle; we just never thought of going. Our loss.

If school funding doesn’t really matter, why do people want their kid’s school to be well funded?

A question came up about the effects of school funding and student performance, and we were referred to this review article from a few years ago by Larry Hedges, Terri Pigott, Joshua Polanin, Ann Marie Ryan, Charles Tocci, and Ryan Williams:

One question posed continually over the past century of education research is to what extent school resources affect student outcomes. From the turn of the century to the present, a diverse set of actors, including politicians, physicians, and researchers from a number of disciplines, have studied whether and how money that is provided for schools translates into increased student achievement. The authors discuss the historical origins of the question of whether school resources relate to student achievement, and report the results of a meta-analysis of studies examining that relationship. They find that policymakers, researchers, and other stakeholders have addressed this question using diverse strategies. The way the question is asked, and the methods used to answer it, is shaped by history, as well by the scholarly, social, and political concerns of any given time. The diversity of methods has resulted in a body of literature too diverse and too inconsistent to yield reliable inferences through meta-analysis. The authors suggest that a collaborative approach addressing the question from a variety of disciplinary and practice perspectives may lead to more effective interventions to meet the needs of all students.

I haven’t followed this literature carefully. It was my vague impression that studies have found effects of schools on students’ test scores to be small. So, not clear that improving schools will do very much. On the other hand, everyone wants their kid to go to a good school. Just for example, all the people who go around saying that school funding doesn’t matter, they don’t ask to reduce the funding of their own kids’ schools. And I teach at an expensive school myself. So lots of pieces here, hard for me to put together.

I asked education statistics expert Beth Tipton what she thought, and she wrote:

I think the effect of money depends upon the educational context. For example, in higher education at selective universities, the selection process itself is what ensures success of students – the school matters far less. But in K-12, and particularly in under resourced areas, schools and finances can matter a lot – thus the focus on charter schools in urban locales.

I guess the problem here is that I’m acting like the typical uninformed consumer of research. The world is complicated, and any literature will be a mess, full of claims and counter-claims, but here I am expecting there to be a simple coherent story that I can summarize in a short sentence (“Schools matter” or “Schools don’t matter” or, maybe, “Schools matter but only a little”).

Given how frustrated I get when others come into a topic with this attitude, I guess it’s good for me to recognize when I do it.

Hey, here’s some free money for you! Just lend your name to this university and they’ll pay you $1000 for every article you publish!

Remember that absolutely ridiculous claim that scientific citations are worth $100,000 each?

It appears that someone is taking this literally. Or, nearly so. Nick Wise has the story:

A couple of months ago a professor received the following email, which they forwarded to me.

Dear esteemed colleagues,

We are delighted to extend an invitation to apply for our prestigious remote research fellowships at the University of Religions and Denominations (URD) . . . These fellowships offer substantial financial support to researchers with papers currently in press, accepted or under review by Scopus-indexed journals. . . .

Fellowship Type: Remote Short-term Research Fellowship. . . .

Affiliation: Encouragement for researchers to acknowledge URD as their additional affiliation in published articles.

Remuneration: Project-based compensation for each research article.

Payment Range: Up to $1000 USD per article (based on SJR journal ranking). . . .

Why would the institution pay researchers to say that they are affiliated with them? It could be that funding for the university is related to the number of papers published in indexed journals. More articles associated with the university can also improve their placing in national or international university rankings, which could lead directly to more funding, or to more students wanting to attend and bringing in more money.

The University of Religions and Denominations is a private Iranian university . . . Until recently the institution had very few published papers associated with it . . . and their subject matter was all related to religion. . . . However, last year there was a substantial increase to 103 published papers, and so far this year there are already 35. This suggests that some academics have taken them up on the offer in the advert to include URD as an affiliation.

Surbhi Bhatia Khan is a lecturer in data science at the University of Salford in the UK since March 2023 and a top 2% scientist in the world according to Stanford University’s rankings. She published 29 research articles last year according to Dimensions, an impressive output, in which she was primarily affiliated to the University of Salford. In addition though, 5 of those submitted in the 2nd half of last year had an additional affiliation at the Department of Engineering and Environment at URD, which is not listed as one of the departments on the university website. Additionally, 19 of the 29 state that she’s affiliated to the Lebanese American University in Beirut, which she was not affiliated with before 2023. She is yet to mention her role at either of these additional affiliations on her LinkedIn profile.

Looking at the Lebanese American University, another private university, its publication numbers have shot up from 201 in 2015 to 503 in 2021 and 2,842 in 2023, according to Dimensions. So far in 2024 they have published 525, on track for over 6,000 publications for the year. By contrast, according to the university website, the faculty consisted of 547 full-time staff members in 2021 but had shrunk to 423 in 2023. It is hard to imagine how such growth in publication numbers could occur without a similar growth in the faculty, let alone with a reduction.

Wise writes:

How many other institutions are seeing incredible increases in publication numbers? Last year we saw gaming of the system on a grand scale by various Saudi Arabian universities, but how many offers like the one above are going around, whether by email or sent through Whatsapp groups or similar?

It’s bad news when universities in England, Iran, Saudi Arabia, and Lebanon start imitating the corrupt citation practices that we have previously associated with nearby Cornell University.

But I can see where Dr. Khan is coming from: if someone’s gonna send you free money, why not take it? Even if the “someone” is a University of Religions and Denominations, and none of your published research relates to religion, and you list an affiliation with an apparently nonexistent department.

The only thing that’s bugging me is that, according to an esteemed professor at Northeastern University, citations are worth $100,000 each—indeed, we are told that it is possible to calculate “exactly how much a single citation is worth.” In that case, Dr. Khan is getting ripped off by University of Religions and Denominations, who are offering a paltry “up to $1000”—and that’s per article, not per citation! I know about transaction costs etc. but maybe she could at least negotiate them up to $2000 per.

I can’t imagine this scam going on for long, but while it lasts you might as well get in on it. Why should professors at Salford University have all the fun?

Parting advice

Just one piece of advice for anyone who’s read this far down into the post: if you apply for the “Remote Short-term Research Fellowship” and you get it, and you send them the publication notice for your article that includes your affiliation with the university, and then they tell you that they’ll be happy to send you a check for $1000, you just have to wire them a $10 processing fee . . . don’t do it!!!

I’ve been mistaken for a chatbot

… Or not, according to what language is allowed.

At the start of the year I mentioned that I am on a bad roll with AI just now, and the start of that roll began in late November when I received reviews back on a paper. One reviewer sent in a 150 word review saying it was written by chatGPT. The editor echoed, “One reviewer asserted that the work was created with ChatGPT. I don’t know if this is the case, but I did find the writing style unusual ….” What exactly was unusual was not explained.

That was November 20th. By November 22nd my computer shows a file created named ‘tryingtoproveIamnotchatbot,’ which is just a txt where I pasted in the GitHub commits showing progress on the paper. I figured maybe this would prove to the editors that I did not submit any work by chatGPT.

I didn’t. There are many reasons for this. One is I don’t think that I should. Further, I suspect chatGPT is not so good at this (rather specific) subject and between me and my author team, I actually thought we were pretty good at this subject. And I had met with each of the authors to build the paper, its treatise, data and figures. We had a cool new meta-analysis of rootstock x scion experiments and a number of interesting points. Some of the points I might even call exciting, though I am biased. But, no matter, the paper was the product of lots of work and I was initially embarrassed, then gutted, about the reviews.

Once I was less embarrassed I started talking timidly about it. I called Andrew. I told folks in my lab. I got some fun replies. Undergrads in my lab (and others later) thought the review itself may have been written by chatGPT. Someone suggested I rewrite the paper with chatGPT and resubmit. Another that I just write back one line: I’m Bing.

What I took away from this was myriad, but I came up with a couple next steps. I decided this was not a great peer review process that I should reach out to the editor (and, as one co-author suggested, cc the editorial board). And another was to not be so mortified as to not talk about this.

What I took away from these steps were two things:

1) chatGPT could now control my language.

I connected with a senior editor on the journal. No one is a good position here, and the editor and reviewers are volunteering their time in a rapidly changing situation. I feel for them and for me and my co-authors. The editor and I tried to bridge our perspectives. It seems he could not have imagined that I or my co-authors would be so offended. And I could not have imagined that the journal already had a policy of allowing manuscripts to use chatGPT, as long as it was clearly stated.

I was also given some language changes to consider, so I might sound less like chatGPT to reviewers. These included some phrases I wrote in the manuscript (e.g. `the tyranny of terroir’). Huh. So where does that end? Say I start writing so I sound less to the editor and others ‘like chatGPT’ (and I never figured out what that means), then chatGPT digests that and then what? I adapt again? Do I eventually come back around to those phrases once they have rinsed out of the large language model?

2) Editors are shaping the language around chatGPT.

Motivated by a co-author’s suggestion, I wrote a short reflection which recently came out in a careers column. I much appreciate the journal recognizing this as an important topic and that they have editorial guidelines to follow for clear and consistent writing. But I was surprised by the concerns from the subeditors on my language. (I had no idea my language was such a problem!)

This problem was that I wrote: I’ve been mistaken for a chatbot (and similar language). The argument was that I had not been mistaken — my writing had been. The debate that ensued was fascinating. If I had been in a chatroom and this happened, then I could write `I’ve been mistaken for a chatbot’ but since my co-authors and I wrote this up and submitted it to a journal, it was not part of our identities. So I was over-reaching in my complaint. I started to wonder: if I could not say ‘I was mistaken for an AI bot’ — why does the chatbot get ‘to write’? I went down an existential hole, from which I have not fully recovered.

And since then I am still mostly existing there. On the upbeat side, writing the reflection was cathartic and the back and forth with the editors — who I know are just trying to their jobs too — gave me more perspectives and thoughts, however muddled. And my partner recently said to me, “perhaps one day it will be seen as a compliment to be mistaken for a chatbot, just not today!”

Also, since I don’t know an archive that takes such things so I will paste the original unedited version below.

I have just been accused of scientific fraud. It’s not data fraud (which, I guess, is a relief because my lab works hard at data transparency, data sharing and reproducibility). What I have just been accused of is writing fraud. This hurts, because—like many people—I find writing a paper a somewhat painful process.

Like some people, I comfort myself by reading books on how to write—both to be comforted by how much the authors of such books stress that writing is generally slow and difficult, and to find ways to improve my writing. My current writing strategy involves willing myself to write, multiple outlines, then a first draft, followed by much revising. I try to force this approach on my students, even though I know it is not easy, because I think it’s important we try to communicate well.

Imagine my surprise then when I received reviews back that declared a recently submitted paper of mine a chatGPT creation. One reviewer wrote that it was `obviously Chat GPT’ and the handling editor vaguely agreed, saying that they found `the writing style unusual.’ Surprise was just one emotion I had, so was shock, dismay and a flood of confusion and alarm. Given how much work goes into writing a paper, it was quite a hit to be accused of being a chatbot—especially in short order without any evidence, and given the efforts that accompany the writing of almost all my manuscripts.

I hadn’t written a word of the manuscript with chatGPT and I rapidly tried to think through how to prove my case. I could show my commits on GitHub (with commit messages including `finally writing!’ and `Another 25 mins of writing progress!’ that I never thought I would share), I could try to figure out how to compare the writing style of my pre-chatGPT papers on this topic to the current submission, maybe I could ask chatGPT if it thought I it wrote the paper…. But then I realized I would be spending my time trying to prove I am not a chatbot, which seemed a bad outcome to the whole situation. Eventually, like all mature adults, I decided what I most wanted to do was pick up my ball (manuscript) and march off the playground in a small fury. How dare they?

Before I did this, I decided to get some perspectives from others—researchers who work on data fraud, co-authors on the paper and colleagues, and I found most agreed with my alarm. One put it most succinctly to me: `All scientific criticism is admissible, but this is a different matter.’

I realized these reviews captured both something inherently broken about the peer review process and—more importantly to me—about how AI could corrupt science without even trying. We’re paranoid about AI taking over us weak humans and we’re trying to put in structures so it doesn’t. But we’re also trying to develop AI so it helps where it should, and maybe that will be writing parts of papers. Here, chatGPT was not part of my work and yet it had prejudiced the whole process simply by its existential presence in the world. I was at once annoyed at being mistaken for a chatbot and horrified that reviewers and editors were not more outraged at the idea that someone had submitted AI generated text.

So much of science is built on trust and faith in the scientific ethics and integrity of our colleagues. We mostly trust others did not fabricate their data, and I trust people do not (yet) write their papers or grants using large language models without telling me. I wouldn’t accuse someone of data fraud or p-hacking without some evidence, but a reviewer felt it was easy enough to accuse me of writing fraud. Indeed, the reviewer wrote, `It is obviously [a] Chat GPT creation, there is nothing wrong using help ….’ So it seems, perhaps, that they did not see this as a harsh accusation, and the editor thought nothing of passing it along and echoing it, but they had effectively accused me of lying and fraud in deliberately presenting AI generated text as my own. They also felt confident that they could discern my writing from AI—but they couldn’t.

We need to be able to call out fraud and misconduct in science. Currently, the costs to the people who call out data fraud seem too high to me, and the consequences for being caught too low (people should lose tenure for egregious data fraud in my book). But I am worried about a world in which a reviewer can casually declare my work AI-generated, and the editors and journal editor simply shuffle along the review and invite a resubmission if I so choose. It suggests not only a world in which the reviewers and editors have no faith in the scientific integrity of submitting authors—me—but also an acceptance of a world where ethics are negotiable. Such a world seems easy for chatGPT to corrupt without even trying—unless we raise our standards.

Side note: Don’t forget to submit your entry to the International Cherry Blossom Prediction Competition!

“Don’t feed the trolls” and the troll semi-bluff

I was dealing with some trolling in blog comments awhile ago and someone sent me an email of support, appreciating my patience in my responses to the troll. The standard advice is “Don’t feed the trolls,” but usually here it seems to have worked well to give polite but firm and focused responses. One reason for this is that this is a blog, not twitter, so a troll can’t get hundreds or thousands here of “likes” for just expressing an opinion; instead, there’s room in comments for all sides to make their arguments. So often we get the best possible outcome: the would-be troll gets to make his point at whatever length he wants, others can respond, and the discussion is out there. Polite but firm responses take some of the thrills out of trolling; the people who really want to up the emotional level can go to twitter or 4chan.

Occasionally, though, a troll keeps coming back and making the same point over and over again, with a combination of rudeness, repetition, and lack of content that degrades the discussion, and I have to ask the troll to stop, or, with very rare necessity, to block him from commenting.

The semi-bluff

In poker, a “semi-bluff” is when you bluff, but your hand has some potential for improvement, so if you get called on it, you still have a possible out. I get the impression that the occasional trolling commenters here are engaged in a sort of semi-trolling. On one hand, they want to provoke strong reactions, and I get the impression they see themselves as charming, if not completely house-trained, gadflies or imps. So they can keep looping around the same refuted points over and over again on the ha-ha-troll theory that the rest of us have no sense of perspective. At the same time, they seem to sincerely believe in the deep truth of whatever he happens to be saying. (I’m guessing that even flat-out data fabricators, liars, and hacks believe that, in some deep sense, they’re serving a good cause.) So they’re kinda trolling but also are being sincere at some level.

The disturbing thing is that, frequent blog targets such as plagiarists, nudgelords, pizzagate proponents, sloppy pundits, confused popularizers, etc., probably see me as this kind of troll! I keep banging on and on and never give up. That horse thing. That Javert thing.

As is often the case, I’m stuck here in the pluralist’s dilemma. There’s no logical way to criticize trolls in general without entering the vortex and calling into question my own work that is getting trolled. Cantor or Russell would understand.

Lancet-bashing!

Retraction Watch points to this fun article by Ashley Rindsberg, “The Lancet was made for political activism,” subtitled, For 200 years, it has thrived on melodrama and scandal.

And they didn’t even mention Surgisphere (for more detail, see here) or this story (the PACE study) or this one about gun control.

All journals publish bad papers; we notice Lancet’s more because they get more publicity.