“Participants reported being hungrier when they walked into the café (mean = 7.38, SD = 2.20) than when they walked out [mean = 1.53, SD = 2.70, F(1, 75) = 107.68, P < 0.001]."

Posted on February 2, 2024 9:46 AM by Andrew

Jonathan Falk came across this article and writes:

Is there any possible weaker conclusion than “providing caloric information may help some adults with food decisions”?

Is there any possible dataset which would contradict that conclusion?

On one hand, gotta give the authors credit for not hyping or overclaiming. On the other hand, yeah, the statement, “providing caloric information may help some adults with food decisions,” is so weak as to be essentially empty. I wonder whether part of the problem here is the convention that the abstract is supposed to conclude with some general statement, something more than just, “That’s what we found in our data.”

Still and all, this doesn’t reach the level of the classic “Participants reported being hungrier when they walked into the café (mean = 7.38, SD = 2.20) than when they walked out [mean = 1.53, SD = 2.70, F(1, 75) = 107.68, P < 0.001]."

Lancet-bashing!

Posted on February 1, 2024 9:20 AM by Andrew

Retraction Watch points to this fun article by Ashley Rindsberg, “The Lancet was made for political activism,” subtitled, For 200 years, it has thrived on melodrama and scandal.

And they didn’t even mention Surgisphere (for more detail, see here) or this story (the PACE study) or this one about gun control.

All journals publish bad papers; we notice Lancet’s more because they get more publicity.

Hey, I got tagged by RetractoBot!

Posted on January 30, 2024 9:42 AM by Andrew

A message came in my inbox from “The RetractoBot Team, University of Oxford,” with subject line, “RetractoBot: You cited a retracted paper”:

That’s funny! When we cited that paper by Lacour and Green, we already knew it was no good. Indeed, that’s why we cited it. Here’s the relevant paragraph from our article:

In political science, the term “replication” has traditionally been applied to the simple act of reproducing a published result using the identical data and code as used in the original analysis. Anyone who works with real data will realize that this exercise is valuable and can catch problems with sloppy data analysis (e.g., the Excel error of Reinhart and Rogoff 2010, or the “gremlins” article of Tol 2009, which required nearly as many corrections as the number of points in its dataset; see Gelman 2014). Reexamination of raw data can also expose mistakes, such as the survey data of LaCour and Green (2014); see Gelman (2015).

We also cited two other notorious papers, Reinhart and Rogoff (2010) and Tol (2009), both of which should have been retracted but are still out there in the literature. According to Google scholar, Reinhard and Rogoff (2010) has been cited more than 5000 times! I guess that many of these citations are from articles such as mine, using it as an example of poor workflow, but still. Meanwhile, Tol (2009) has been cited over 1500 times. It does have a “correction and update” from 2014, but that hardly covers its many errors and inconsistencies.

Anyway, I can’t blame RetractoBot for not noticing the sense of my citation; it’s just funny how they sent that message.

Sympathy for the Nudgelords: Vermeule endorsing stupid and dangerous election-fraud claims and Levitt promoting climate change denial are like cool dudes in the 60s wearing Che T-shirts and thinking Chairman Mao was cool—we think they’re playing with fire, they think they’re cute contrarians pointing out contradictions in the system. For a certain kind of person, it’s fun to be a rogue.

Posted on January 28, 2024 9:04 AM by Andrew

A few months ago I wrote about some disturbing stuff I’d been hearing about from Harvard Law School professors Cass Sunstein and Adrian Vermeuele. The two of them wrote an article back in 2005 writing, “a refusal to impose [the death] penalty condemns numerous innocent people to death. . . . a serious commitment to the sanctity of human life may well compel, rather than forbid that form of punishment. . . .”

My own view is that the death penalty makes sense in some settings and not others. To say that “a serious commitment to the sanctity of human life may well compel” the death penalty . . . jeez, I dunno, that’s some real Inquisition-level thinking going on. Not just supporting capital punishment, they’re compelling it. That’s a real edgelord attitude, kinda like the thought-provoking professor in your freshman ethics class who argues that companies have not just the right but the moral responsibility to pollute the maximum amount possible under the law because otherwise they’re ducking their fiduciary responsibility to the shareholders. Indeed, it’s arguably immoral to not pollute beyond the limits of the law if the expected gain from polluting is lower than the expected loss from getting caught and fined.

Sunstein and Vermeule also recommended that the government should fight conspiracy theories by engaging in “cognitive infiltration of extremist groups,” which seemed pretty rich, considering that Vermeule spent his online leisure hours after the 2020 election promoting election conspiracy theories. Talk about the fox guarding the henhouse. This is one guy I would not trust to be in charge of government efforts to cognitively infiltrate extremist groups!

Meanwhile, these guys go on NPR, they’ve held appointive positions with the U.S. government, they’re buddies with elite legal academics . . . it bothers me! I’m not saying their free speech should be suppressed—we got some Marxists running around in this country too—I just don’t want them anywhere near the levers of power.

Anyway, I heard by email from someone who knows Sunstein and Vermeuele. It seems that both of them are nice guys, and when they stick to legal work and stay away from social science or politics they’re excellent scholars. My correspondent also wrote:

And on that 2019 Stasi tweet. Yes, it was totally out of line. You and others were right to denounce it. But I think it’s worth pointing out that he deleted the tweet the very same day (less than nine hours later), apologized for it as an ill-conceived attempt at humor, and noted with regret that the tweet came across as “unkind and harsh to good people doing good and important work.” I might gently and respectfully suggest that continuing to bring up this tweet four years later, after such a prompt retraction—which was coupled with an acknowledgement of the value of the work that you and others are doing in focusing on the need for scrutiny and replication of eye-catching findings—might be perceived as just a tad ungracious, even by those who believe that Cass was entirely in the wrong and you were entirely in the right as regards the original tweet. To paraphrase one of the great capital defense lawyers (who obviously said this in a much more serious context), all of us are better than our worst moment.

I replied:

– Regarding the disjunction between Vermeule’s scholarly competence and nice-guyness, on one hand, and his extreme political views: I can offer a statistical or population perspective. Think of a Venn diagram where the two circles are “reasonable person” and “extreme political views and actions.” (I’m adding “actions” here to recognize that the issue is not just that Vermeule thinks that a fascist takeover would be cool, but that he’s willing to sell out his intellectual integrity for it, in the sense of endorsing ridiculous claims.)

From an ethical point of view, there’s an argument in favor of selling out one’s intellectual integrity for political goals. One can make this argument for Vermeule or also for, say, Ted Cruz. The argument is that the larger goal (a fascist government in the U.S., or more power for Ted Cruz) is important enough that it’s worth making such a sacrifice. Or, to take slightly lesser examples, the argument would be that when Hillary Clinton lied about her plane being shot at, or when Donald Trump lied about . . . ok, just about everything, that they were thinking about larger goals. Indeed, one could argue that for Cruz and the other politicians, it’s not such a big deal—nobody expects politicians to believe half of what they’re saying anyway—but for Vermeule to trash his reputation in this way, that shows real commitment!

Actually, I’m guessing that Vermeule was just spending too much time online in a political bubble, and he didn’t really think that endorsing these stupid voter-fraud claims meant anything. To put it another way, you and I think that endorsing unsubstantiated claims of voting fraud is bad for two reasons: (1) intellectually it’s dishonest to claim evidence for X when you have no evidence for X, (2) this sort of thing is dangerous in the short term by supplying support to traitors, and (3) it’s dangerous in the long term by degrading the democratic process. But, for Vermeule, #2 and #3 might well be a plus not a minus, and, as for #1, I think it’s not uncommon for people to make a division between their professional and non-professional statements, and to have a higher standard for the former than the latter. Vermeule might well think, “Hey, that’s just twitter, it’s not real.” Similarly, the economist Steven Levitt and his colleagues wrote all sorts of stupid things (along with many smart things) under the Freakonomics banner, thinks which I guess (or, should I say, hope) he’d never have done in his capacity as an academic. Just to be clear, I’m not saying that everyone does this, indeed I don’t think I do it—I stand by what I blog, just as I stand by my articles and books—but I don’t think everyone does. Another example that’s kinda famous is biologists who don’t believe in evolution. They can just separate the different parts of their belief systems.

Anyway, back to the Venn diagram. The point is that something like 30% of Americans believe this election fraud crap. 30% of Americans won’t translate into 30% of competent and nice-guy law professors, but it won’t be zero, either. Even if it’s only 10% or less in that Venn overlap, it won’t be zero. And the people inside that overlap will get attention. And some of them like the attention! So at that point you can get people going further and further off the deep end.

If it would help, you could think of this as a 2-dimensional scatterplot rather than a Venn diagram, and in this case you can picture the points drifting off to the extreme over time.

To look at this another way, consider various well-respected people in U.S. and Britain who were communists in the 1930s through 1950s. Some of these people were scientists! And they said lots of stupid things. From a political perspective, that’s all understandable: even if they didn’t personally want to tear up families, murder political opponents, start wars, etc., they could make the case that Stalin’s USSR was a counterweight to fascism elsewhere. But from an intellectual perspective, they wouldn’t always make that sort of minimalist case. Some of them were real Soviet cheerleaders. Again, who knows what moral calculations they were making in their heads.

I’m not gonna go all Sunstein-level contrarian and argue that selling out one’s intellectual integrity is the ultimate moral sacrifice—I’m picturing a cartoon where Vermeule is Abraham, his reputation is Isaac, and the Lord is thundering above, booming down at him to just do it already—but I guess the case could be made, indeed maybe will be the subject of one of the 8 books that Sunstein comes out with next year and is respectfully reviewed on NPR etc.

– Regarding the capital punishment article: I have three problems here. The first is their uncritical acceptance of a pretty dramatic claim. In Sunstein and Vermeule’s defense, though, back in 2005 it was standard in social science for people to think that statistical significance + identification strategy + SSRN or NBER = discovery. Indeed, I’d guess that most academic economists still think that way! So to chide them on their innumeracy here would be a bit . . . anachronistic, I guess. The second problem is that I’m guessing they were so eager to accept this finding is that it allowed them to make this cool point that they wanted to make. If they’d said, “Here’s a claim, maybe it’s iffy but if it’s true, it has some interesting ethical implications…”, that would be one thing. But that’s not what I read their paper as saying. By saying “Recent evidence suggests that capital punishment may have a significant deterrent effect” and not considering the opposite, they’re making the fallacy of the one-way bet. My third problem is that I think their argument is crap, even setting aside the statistical study. I discussed this a bit in my post. There are two big issues they’re ignoring. The first is that if each execution saves 18 lives, then maybe we should start executing innocent people! Or, hey, we can find some guilty people to execute, maybe some second-degree murderers, armed robbers, arsonists, tax evaders, speeders, jaywalkers, . . . . shouldn’t be too hard to find some more targets–after all, they used to have the death penalty for forgery. Just execute a few hundred of them and consider how many lives will be saved. That may sound silly to you, but it’s Sunstein and Vermeule, not me, who wrote that bit about “a serious commitment to the sanctity of human life.” I discussed the challenges here in more detail in a 2006 post; see the section, “The death penalty as a decision-analysis problem?” My point is not that they have to agree with me, just that it’s not a good sign that their long-ass law article with its thundering about “the sanctity of human life” is more shallow than two paragraphs of a blog post.

In summary regarding the death-penalty article, I’m not slamming them for falling for crappy research (that’s what social scientists and journalists did back in 2005, and lots of them still to do this day) and I’m not slamming them for supporting death penalty (I’ve supported it too, at various times in my life; more generally I think it depends on the situation and that the death penalty can be a good idea in some circumstances, even if the current version in the U.S. doesn’t work so well). I’m slamming them for taking half-assed reasoning and presenting it as sophisticated. I’d say they don’t know better, they’re just kinda dumb—but you assure me that Vermeule is actually smart. So my take on it is that they’re really good at playing the academic game. For me to criticize their too-clever-by-half “law and economics” article as not being well thought through, that would be like criticizing LeBron James for not being a golf champion. They do what’s demanded of them in their job.

– Regarding Sunstein’s ability to learn from error: Yes, I mention in my post that Sunstein was persuaded by the article by Wolfers and Donohue. I do think it was good that Sunstein retracted his earlier stance. That’s one reason I was particularly disappointed by what he and his collaborator did in the second edition of Nudge, which was to memory-hole the Wansink episode. It was such a great opportunity in the revision, for them to have said that the nudge idea is so compelling that they (and many others) were fooled, and to consider the implications: in a world where people are rewarded for discovering apparently successful nudges, the Wansinks of the world will prosper, at least in the short term. Indeed, Sunstein and Thaler could’ve even put a positive spin on it by talking about the self-correcting nature of science, sunlight is the best disinfectant, etc. But, no, instead they remove it entirely, and then Sunstein returns to his previous credulous self by posting something on what he called the “coolest behavioral finding of 2019.” Earlier they’d referred to Wansink as having had multiple masterpieces. Kind of makes you question their judgment, no? My take on this is . . . for them, everyone’s a friend, so why rock the boat? As I wrote, it looks to me like an alliance of celebrities. I’m guessing that they are genuinely baffled by people like Uri Simonsohn or me who criticize this stuff: Don’t we have anything better to do? It’s natural to think of behavior of Simonsohn, me, and other “data thugs” as being kinda pathological: we are jealous, or haters, or glory-seekers, or we just have some compulsion to be mean (the kind of people who, in another life, would be Stasi).

– Regarding the Stasi quote: Yes, I agree it’s a good thing Sunstein retracted it. I was not thrilled that in the retraction he said he’d thought it had “a grain of truth,” but, yeah, as retractions go, it was much better than average! Much better than the person who called people “terrorists,” never retracted or apologized, then later published an article lying about a couple of us (a very annoying episode to me, which I have to kind of keep quiet about cos nobody likes a complainer, but grrrr it burns me up, that people can just lie in public like that and get away with it). So, yes, for sure, next time I write about this I will emphasize that he retracted the Stasi line.

– Libertarian paternalism: There’s too much on this for one email, but for my basic take, see this post, in particular the section “Several problems with science reporting, all in one place.” This captures it: Sunstein is all too willing to think that ordinary people are wrong, while trusting the testimony of Wansink, who appears to have been a serial fabricator. It’s part of a world in which normies are stupidly going about their lives doing stupid things, and thank goodness (or, maybe I should say in deference to Vermeule, thank God) there are leaders like Sunstein, Vermeule, and Wansink around to save us from ourselves, and also in the meantime go on NPR, pat each other on the back on Twitter, and enlist the U.S. government in their worthy schemes.

– People are complicated: Vermeule and Sunstein are not “good guys” or “bad guys”; they’re just people. People are complicated. What makes me sad about Sunstein is that, as you said, he does care about evidence, he can learn from error. But then he chooses not to. He chooses to stay in his celebrity comfort zone, making stupid arguments evaluating the president’s job performance based on the stock market, cheerleading biased studies about nudges as if they represent reality. See the last three paragraphs here. Another bad thing Sunstein did recently was to coauthor that Noise book. Another alliance of celebrities! (As a side note, I’m sad to see the collection of academic all-star endorsements that this book received.) Regarding Sunstein himself, see the section “A new continent?” of that post. As I wrote at the time, if you’re going to explore a new continent, it can help to have a local guide who can show you the territory.

Vermeule I know less about; my take is that he’s playing the politics game. He thinks that on balance the Republicans are better than the Democrats, and I’m guessing that when he promotes election fraud misinformation, that he just thinks he’s being mischievous and cute. After all, the Democrats promoted misinformation about police shootings or whatever, so why can’t he have his fun? And, in any case, election security is important, right? Etc etc etc. Anyone with a bit of debate-team experience can justify lots worse than Vermeule’s post-election tweets. I guess they’re not extreme enough for Sunstein to want to stop working with him.

– Other work by Vermeule and Sunstein: They’re well-respected academics, also you and others say how smart they are, so I can well believe they’ve also done high-quality work. It might be that their success in some subfields led them into a false belief that they know what they’re doing in other areas (such as psychology research, statistics, and election administration) where they have no expertise. As the saying goes, sometimes it’s important to know what you don’t know.

My larger concern, perhaps, is that these people get such deference in academia and the news media, that they start to believe their own hype and they think they’re experts in everything.

– Conspiracy theories: Sunstein and Vermeule wrote, “Many millions of people hold conspiracy theories; they believe that powerful people have worked together in order to withhold the truth about some important practice or some terrible event. A recent example is the belief, widespread in some parts of the world, that the attacks of 9/11 were carried out not by Al Qaeda, but by Israel or the United States.” My point here is that there are two conspiracy theories here: a false conspiracy theory that the attacks were carried out by Israel or the United States, and a true conspiracy theory that the attacks were carried out by Al Qaeda. In the meantime, Vermeule has lent his support to unsupported conspiracy theories regarding the 2020 election. So Vermeule is incoherent. On one hand, he’s saying that conspiracy theories are a bad thing. On the other hand, in one place he’s not recognizing the existence of true conspiracies; in another place he’s supporting ridiculous and dangerous conspiracy theories, I assume on the basis that they are in support of his political allies. I don’t think it’s a cheap shot to point out this incoherence.

And what the does it mean that Sunstein thinks that “Because those who hold conspiracy theories typically suffer from a ‘crippled epistemology,’ in accordance with which it is rational to hold such theories, the best response consists in cognitive infiltration of extremist groups.”—but he continues to work with Vermeule? Who would want to collaborate with someone who suffers from a crippled epistemology (whatever that means)? The whole thing is hard for me to interpret except as an elitist position where some people such as Sunstein and Vermeule are allowed to believe whatever they want, and hold government positions, while other people get “cognitively infiltrated.”

– The proposed government program: I see your point that when the government is infiltrating dangerous extremist groups, it could make sense for them to try to talk some of these people out of their extremism. After all, for reasons of public safety the FBI and local police are already doing lots of infiltration anyway—they hardly needed Sunstein and Vermeule’s encouragement. Overall I suspect it’s a good thing that the cops are gathering intelligence this way rather than just letting these groups make plans in secret, set off bombs, etc., and once the agents are on the inside, I’d rather have them counsel moderation than do that entrapment thing where they try to talk people into planning crimes so as to be able to get more arrests.

I think what bothers me about the Sunstein and Vermeule article—beyond that they’re worried about conspiracy theories while themselves promoting various con artists and manipulators—is in their assumption that the government is on the side of the good. Perhaps this is related to Sunstein being pals with Kissinger. I labeled Sunstein and Vermeuele as libertarian paternalists, but maybe Vermuele is better described as an authoritarian; in any case they seem to have the presumption that the government is on their side, whether it’s for nudging people to do good things (not to do bad things) or for defusing conspiracy theories (not to support conspiracy theories).

But governments can’t always be trusted. When I wrote, “They don’t even seem to consider a third option, which is the government actively promoting conspiracy theories,” it’s not that I was saying that this third option was a good thing! Rather, I was saying that the third option is something that’s actually done, and I gave examples of the U.S. executive branch and much of Congress in the period Nov 2020 – Jan 2021, and the Russian government in their invasion of Ukraine.” And it seems that Vermeule may well be cool with both these things! So my reaction to Vermeule saying the government should be engaging in information warfare is similar to my reaction when the government proposed to start a terrorism-futures program and have it be run by an actual terrorist: it might be a good idea in theory and even in practice, but (a) these are not the guys I would want in charge of such a program, and (b) their enthusiasm for it makes me suspicious.

– Unrelated to all the above: You say of Vermeule, “after his conversion to Catholicism, he adopted the Church’s line on moral opposition to capital punishment.” That’s funny because I thought the Catholic church was cool with the death penalty—they did the inquisition, right?? Don’t tell me they’ve flip-flopped! Once they start giving into the liberals on the death-penalty issue, all hell will break loose.

OK, why did I write all that?

1. The mix of social science, statistical evidence, and politics is interesting and important.

2. As an academic, I’m always interested in academics behaving badly, especially when it involves statistics or social science in some way. In particular, the idea that these guys are supposed to be so smart and so nice in regular life, and then they go with these not-so-smart, not-so-nice theories, that’s interesting. When mean, dumb people promote mean, dumb ideas, that’s not so interesting. But when nice, smart people do it . . .

3. It’s been unfair to Sunstein for me to keep bringing up that Stasi thing.

Regarding item 2, one analogy I can see with Vermeule endorsing stupid and dangerous election-fraud claims is dudes in the 60s wearing Che T-shirts and thinking Chairman Mao was cool. From one perspective, Che was one screwed-up dude and Mao was one of history’s greatest monsters . . . but both of them were bad-ass dudes and it was cool to give the finger to the Man. Similarly, Vermeule could well think of Trump as badass, and he probably thinks its hilarious to endorse B.S. claims that support his politics. Kinda like how Steven Levitt probably thinks he’s a charming mischievous imp by supporting climate denialists. Levitt would not personally want his (hypothetical) beach house on Fiji to be flooded, but, for a certain kind of person, it’s fun to be a rogue.

Here’s what I wrote when the topic came up before:

There’s no evidence that Vermeule was trying to overthrow the election. He was merely supportive of these efforts, not doing it himself, in the same way that an academic Marxist might root for the general strike and the soviet takeover of government but not be doing anything active on the revolution’s behalf.

The paradox of replication studies: A good analyst has special data analysis and interpretation skills. But it’s considered a bad or surprising thing that if you give the same data to different analysts, they come to different conclusions.

Posted on January 27, 2024 9:04 AM by Andrew

Benjamin Kircup writes:

I think you will be very interested to see this preprint that is making the rounds: Same data, different analysts: variation in effect sizes due to analytical decisions in ecology and evolutionary biology (ecoevorxiv.org)

I see several ties to social science, including the study of how data interpretation varies across scientists studying complex systems; but also the sociology of science. This is a pretty deep introspection for a field; and possibly damning. The garden of forking paths is wide. They cite you first, which is perhaps a good sign.

Ecologists frequently pride themselves on data analysis and interpretation skills. If there weren’t any variability, what skill would there be? It would all be mechanistic, rote, unimaginative, uninteresting. In general, actually, that’s the perception many have of typical biostatistics. It leaves insights on the table by being terribly rote and using the most conservative kinds of analytic tools (yet another t-test, etc). The price of this is that different people will reach different conclusions with the same data – and that’s not typically discussed, but raises questions about the literature as a whole.

One point: apparently the peer reviews didn’t systematically reward finding large effect sizes. That’s perhaps counterintuitive and suggests that the community isn’t rewarding bias, at least in that dimension. It would be interesting to see what you would do with the data.

The first thing I noticed is that the paper has about a thousand authors! This sort of collaborative paper kind of breaks the whole scientific-authorship system.

I have two more serious thoughts:

1. Kircup makes a really interesting point, that analysts “pride themselves on data analysis and interpretation skills. If there weren’t any variability, what skill would there be?”, but then it’s considered a bad or surprising thing that if you give the same data to different analysts, they come to different conclusions. There really does seem to be a fundamental paradox here. On one hand, different analysts do different things—Pete Palmer and Bill James have different styles, and you wouldn’t expect them to come to the same conclusions—; on the other hand, we expect strong results to appear no matter who is analyzing the data.

A partial resolution to this paradox is that much of the skill of data analysis and interpretation comes in what questions to ask. In these replication projects (I think Bob Carpenter calls them “bake-offs”), several different teams are given the same question and the same data and then each do their separate analysis. David Rothschild and I did one of these; it was called We Gave Four Good Pollsters the Same Raw Data. They Had Four Different Results, and we were the only analysts of that Florida poll from 2016 that estimated Trump to be in the lead. Usually, though, data and questions are not fixed, despite what it might look like when you read the published paper. Still, there’s something intriguing about what we might call the Analyst’s Paradox.

2. Regarding his final bit (“apparently the peer reviews didn’t systematically reward finding large effect sizes”), I think Kircup is missing the point. Peer reviews don’t systematically reward finding large effect sizes. What they systematically reward is finding “statistically significant” effects, i.e. those that are at least two standard errors from zero. But by restricting yourself to those, you automatically overestimate effect sizes, as I discussed to interminable length in papers such as Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors and The failure of null hypothesis significance testing when studying incremental changes, and what to do about it. So they are rewarding bias, just indirectly.

The importance of measurement, and how you can draw ridiculous conclusions from your statistical analyses if you don’t think carefully about measurement . . . Leamer (1983) got it.

Posted on January 26, 2024 9:44 AM by Andrew

Screen Shot 2013-08-03 at 4.23.29 PM

Jacob Klerman writes:

I have noted your recent emphasis on the importance of measurement (e.g., “Here are some ways to make your study replicable…”). For reasons not relevant here, I was rereading Leamer (1983), Let’s Take the Con Out of Econometrics—now 40 years old. It’s a fun, if slightly dated, paper that you seem to be aware of.

Leamer also makes the measurement point (emphasis added):

When the sampling uncertainty S gets small compared to the misspecification uncertainty M ,it is time to look for other forms of evidence, experiments or nonexperiments. Suppose I am interested in measuring the width of a coin. and I provide rulers to a room of volunteers. After each volunteer has reported a measurement, I compute the mean and standard deviation, and I conclude that the coin has width 1.325 millimeters with a standard error of .013. Since this amount of uncertainty is not to my liking, I propose to find three other rooms full of volunteers, thereby multiplying the sample size by four, and dividing the standard error in half. That is a silly way to get a more accurate measurement, because I have already reached the point where the sampling uncertainty S is very small compared with the misspecification uncertainty M. If I want to increase the true accuracy of my estimate, it is time for me to consider using a micrometer. So to in the case of diet and heart disease. Medical researchers had more or less exhausted the vein of nonexperimental evidence, and it became time to switch to the more expensive but richer vein of experimental evidence.

Interesting. Good to see examples where ideas we talk about today were already discussed in the classic literature. I indeed thing measurement is important and is under-discussed in statistics. Economists are very familiar with the importance of measurement, both in theory (textbooks routinely discuss the big challenges in defining, let alone measuring, key microeconomic quantities such as “the money supply”) and in practice (data gathering can often be a big deal, involving archival research, data quality checking, etc., even if unfortunately this is not always done), but then once the data are in, data quality and issues of bias and variance of measurement often seem to be forgotten. Consider, for example, this notorious paper where nobody at any stage in the research, writing, reviewing, revising, or editing process seemed to be concerned about that region with a purported life expectancy of 91 (see the above graph)—and that doesn’t even get into the bizarre fitted regression curve. But, hey, p less than 0.05. Publishing and promoting such a result based on the p-value represents some sort of apogee of trusting implausible theory over realistic measurement.

Also, if you want a good story about why it’s a mistake to think that your uncertainty should just go like 1/sqrt(n), check out this story which is also included in our forthcoming book, Active Statistics.

“My view is that if I can show that a result was cooked and that doing it correctly does not yield the answer the authors claimed, then the result is discredited. . . . What I hear, instead, is the following . . .”

Posted on January 22, 2024 9:16 AM by Andrew

Economic historian Tim Guinnane writes:

I have a general question that I have not seen addressed on your blog. Often this question turns into a narrow question about retracting papers, but I think that short-circuits an important discussion.

Like many in economic history, I am increasingly worried that much research in recent years reflects p-hacking, misrepresentation of the history, useless data, and other issues. I realize that the technical/statistical issues differ from paper to paper.

What I see is something like the following. You can use this paper as a concrete example, but the problems are much more widespread. We document a series of bad research practices. The authors played games with controls to get the “right” answer for the variable of interest. (See Table 1 of the paper). In the text they misrepresent the definitions of variables used in regressions; we show that if you use the stated definition, their results disappear. They use the wrong degrees of freedom to compute error bounds (in this case, they had to program the bounds by hand, since stata automatically uses the right df). There are other and to our minds more serious problems involved in selectively dropping data, claiming sources do not exist, etc.

Step back from any particular problem. How should the profession think about claims such as ours? My view is that if I can show that a result was cooked and that doing it correctly does not yield the answer the authors claimed, then the result is discredited. The journals may not want to retract such work, but there should be support for publishing articles that point out such problems.

What I hear, instead, is the following. A paper estimates beta as .05 with a given SE. Even if we show that this is cooked—that is, that beta is a lot smaller or the SE a lot larger if you do not throw in extraneous regressors, or play games with variable definitions—then ours is not really a result. It is instead, I am told, incumbent on the critic to start with beta=.05 as the null, and show that doing things correctly rejects that null in favor of something less than .05 (it is characteristic of most of this work that there really is no economic theory, so the null is always “X does not matter” which boils down to “this beta is zero.” And very few even tell us whether the correct test is one- or two-sided).

This pushback strikes me as weaponizing the idea of frequentist hypothesis testing. To my mind, if I can show that beta=.05 comes from a cooked regression, then we need to start over. That estimate can be ignored; it is just one of many incorrect estimates one can generated by doing things inappropriately. It actually gives the unscrupulous an incentive to concoct more outlandish betas which are then harder to reject. More generally, it puts a strange burden of proof on critics. I have discussed this issue with some folks in natural sciences who find the pushback extremely difficult to understand. They note what I think is the truth: it encourages bad research behavior by suppressing papers that demonstrate that bad behavior.

It might be opportune to have a general discussion of these sorts of issues on your website. The Gino case raises something much simpler, I think. I fear that it will in some ways lower the bar: so long as someone is not actively making up their data (which I realize has not been proven, in case this email gets subpoenaed!) then we do not need to worry about cooking results.

My reply: You raise several issues that we’e discussed on occasion (for some links, see here):

1. The “Research Incumbency Rule”: Once an article is published in some approved venue, it is taken as truth. Criticisms which would absolutely derail a submission in pre-publication review can be brushed aside if they are presented after publication. This is what you call “the burden of proof on critics.”

2. Garden of forking paths.

3. Honesty and transparency are not enough. Work can be non-fraudulent but still be crap.

4. “Passive corruption” when people know there’s bad work but they don’t do anything about it.

5. A disturbingly casual attitude toward measurement; see here for an example: https://statmodeling.stat.columbia.edu/2023/10/05/no-this-paper-on-strip-clubs-and-sex-crimes-was-never-gonna-get-retracted-also-a-reminder-of-the-importance-of-data-quality-and-a-reflection-on-why-researchers-often-think-its-just-fine-to-publ/ Many economists and others seem to have been brainwashed into thinking that it’s ok to have bad measurement because attenuation bla bla . . . They’re wrong.

He responded: If you want an example of economists using stunningly bad data and making noises about attenuation, see here.

The paper in question has the straightforward title, “We Do Not Know the Population of Every Country in the World for the Past Two Thousand Years.”

What’s the problem, “math snobs” or rich dudes who take themselves too seriously and are enabled in that by the news media?

Posted on January 21, 2024 9:00 AM by Andrew

Chris Barker, the chair of the Statistical Consulting Section of the American Statistical Association, writes:

I’m curious about your reaction/opinion to a Financial times article I read today about Sam Bankman-Fried (“SBF,” charged with fraud in the loss several billion of Crypto) with pointless insults about mathematicians (“mathematical chauvinists,” “math snobs,” “mental arithmetic,” and what seems to be a claim that “math snob” caused to a decline in the UK economy). And disparagement at the potential use or abuse of Bayesian statistics. From the Financial Times article:

We must leave it to the criminal courts to decide the future of Sam Bankman-Fried. He denies the various charges against him. For now, I am less concerned with his specific doings than with his worldview, which is a sort of mathematical chauvinism. A theme in Michael Lewis’s new book about “SBF” is the subject’s mistrust of what cannot be quantified. Shakespeare’s supposed primacy in literature, for example. “What are the odds that the greatest writer was born in 1564?” SBF is quoted as asking, citing the billions of people who have been born since then, and the higher share of them who are educated. These are his “Bayesian priors”. I hope to never encounter a starker case of abstract reasoning getting in the way of practical observation.

He is, if nothing else, of his time. A year ago this weekend, Liz Truss, a maths snob who assailed colleagues with mental arithmetic questions, fell as UK premier, almost taking the economy with her. If we consider, too, the dark, Kremlin-partial end of finance bro politics, these are the most embarrassing times for maths chauvinists since Robert McNamara, who even looked geometric and dug America ever deeper into the pit of Vietnam on the back of data.

I defer to the lexicographers or the relevant experts as to the whether this is the first, or if not, for how long these insults against mathematicians and statisticians have appeared in the media.

I replied that: Yes, the Financial Times article seems pretty bad to me, indeed just seems too stupid to deserve a response! Is the author of the column well-respected in Britain? I did some googling and he seems just like a generic hack political columnist.

Barker replied:

The demonization of math and statistics was disappointing. There is no way to ever know what the editors were thinking by permitting publication of that article, nor do I particularly care. In other areas of the internet that article might simply be called “click bait.”

He also points to this quote from “SBF”:

I could go on and on about the failings of Shakespeare and the constitution and Stradivarius violins, and at the bottom of this post I do, but really I shouldn’t need to: the Bayesian priors are pretty damning. About half of the people born since 1600 have been born in the past 100 years, but it gets much worse than that. When Shakespeare wrote almost all of Europeans were busy farming, and very few people attended university; few people were even literate—probably as low as about ten million people. By contrast there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564? The Bayesian priors aren’t very favorable.

I agree with everyone else that this is represents a misapplication of Bayesian methods but for a kind of subtle reason. The numerator/denominator thing is ok; the real problem is in the premise, which is that there’s something called “the greatest writer.” Was William Shakespeare a greater writer than Veronica Geng? How could you even answer such a question? And, mathematically, you can only apply Bayesian inference to a problem that is well defined.

The general problem to me is not SBF’s asinine “Bayesian priors” quote—if it wasn’t that, he’d be wielding other mysterious power phrases such as “the subconscious” or “comparative advantage” or “quantum leap” or “inflection point” or whatever—, but rather the well-known phenomenon of rich people thinking they know what they’re talking about when they’re actually just making things up in some nonsensical way.

P.S. But, yeah, there is a history of stupid arguments being made with a Bayesian connection.

The free will to repost

Posted on January 19, 2024 9:07 AM by Andrew

Jonathan “no Trump” Falk points to this press release and writes:

Scientist, after decades of study, concludes: We don’t have free will. Does that include the decision to write a book about free will?

PS … A quick mention of the replication crisis to silence some doubters.

PPS: I couldn’t help sending this to you.

I replied with a link to this post from last year, which in turn linked to a post by Kevin Mitchell, who wrote:

Gotta hand it to Sapolsky here . . . it’s quite ballsy to uber-confidently assert we do not have “the slightest scrap of agency” and then support that with one discredited social psych study after another…

Bad stuff going down at the American Sociological Association

Posted on January 17, 2024 9:01 AM by Andrew

I knew the Association for Psychological Science, the American Psychological Association, the American Political Science Association, the American Statistical Association, and the National Academy of Sciences had problems. It turns out the American Sociological Association does some bad things too.

Philip Cohen has the story. It starts back in 2019, when the American Sociological Association, along with “many other paywall-dependent academic societies” (in Cohen’s words) sent an open letter to the president to oppose open science. Here’s Cohen:

At the time, there was a rumor that OSTP [the U.S. Office of Science and Technology Policy] would require agencies to make public the results of research funded by the federal government without a 12-month delay — the cherished “embargo” that allowed these associations to profit from delaying access to public knowledge . . .

They wrote: “We are writing to express our concerns about a possible change in federal policies that could significantly threaten a vibrant American scientific enterprise.” That is, by requiring free access to research, OSTP would threaten the “financial stability that enables us to support peer review that ensures the quality and integrity of the research enterprise.” If ASA lost their journal subscription profits, in other words, American science would die. “To take action to shorten the 12-month embargo… risks the continued international leadership for the U.S. scientific enterprise.”

Uh huh. I agree with Cohen that this is some combination of ridiculous and offensive. He continues:

Despite a petition signed by many ASA members, and a resolution from its own Committee on Publications “to express opposition to the decision by the ASA to sign the December 18, 2019 letter” — which the ASA leadership never even publicly acknowledged — ASA has not uttered a word to alter its anachronistic and unpopular position.

It’s starting to make me wonder if academic cartels sometimes act like . . . cartels?

Just to be clear, this does not seem to be a problem with academic sociology as a profession. As Cohen notes, the ASA’s own Committee on Publications opposed the ASA’s horrible recommendation to keep science closed.

Putting it all into perspective

We live in a world where political leaders start wars, companies and governments dump toxic waste, church leaders cover up child abuse, etc. In comparison, universities and academic societies faking statistics, rewarding plagiarism and other scientific misconduct, restricting data, and otherwise mucking up the process of scholarly inquiry . . . that barely registers on the scale of institutionalized evil.

So what is it that’s so irritating about academic institutions behaving badly? I can think of a few things:

1. I work in academia so I’m made aware of these issues and feel some bit of collective responsibility for them.

2. Academia is more open than much of business, government, and organized religion, so it’s easier for us to see the problems.

3. So much of the enabling of cheating in academia just seems so pointless. It’s not cool when companies pollute, but, hey, you can see the reason$ they’ll want to do so. But what does the American Sociology Association get out of fighting against open science, what does the University of California get out of tolerating research misconduct, what do the American Statistical Association and American Political Science Association get out of giving rewards for plagiarists? Nothing. That’s what so damn pitiful.

When Lysenko did his part to destroy Soviet agriculture, at least he personally got something out of it. These American Sociology Association etc dudes, they get nothing.

It’s really pitiful, when you think about it. These people aren’t evil, they’re pathetic.

“And while I don’t really want a back-and-forth . . .”

Posted on January 14, 2024 9:18 AM by Andrew

A few months ago we had an interesting discussion about evaluation of pollsters, following up on some thoughts of Elliott Morris and Nate Silver, two analysts I respect and with whom I’ve collaborated (on separate occasions). In recent years I’ve become annoyed with Nate from time to time, but, hey, nobody’s perfect and I still think he’s generally a reasonable person.

I had a new feeling of frustration, though, when in one of his recent posts involving the pollsters, Nate wrote, “So take that as a signal that I don’t intend this a back-and-forth.” And then, more recently, in the context of a completely different dispute with someone else, Nate wrote, “And while I don’t really want a back-and-forth . . .”

I get it that everyone’s busy and you don’t have time to respond to every argument that comes across your desk, but . . . back-and-forths are good, no?

Some googling turned up this quote from 2012, which I agree with:

Silver tells TechCrunch that intelligent prediction is messy, biased, and iterative — all the characteristics that don’t lend themselves to grand pronouncements in 30-second soundbites. Blogs, instead, lend themselves to an honest back-and-forth about the sausage of statistical conclusions, which can, hopefully, create a more respected class of experts and a more informed public. . . .

The “iterative” thing is good too. In complicated problems, our methods are always flawed. So, wherever we happen to be right now, we should welcome criticism and opportunities for improvement.

This is a point that I and others have made over the years, for example:

Blogging also has the benefit that the discussion can go back and forth. In contrast, the journal reviewing process is very slow, and once an article is published, it typically just sits there. . . .

This was before twitter, which has the different problem that most of the volume of posts is people cheering or booing. See enough of that and you too will want to cut short all the back-and-forth.

What happened to Nate between 2012, when he explicitly talked about the benefits of “an honest back-and-forth about the sausage of statistical conclusions,” and 2020, when he avoided discussions about problems with our forecasting models, and 2023, when he “doesn’t really want a back-and-forth”?

I don’t have any specific information regarding Nate, but in any case I’m more interested in the general phenomenon of when it is that public figures want to engage in back-and-forths and when they don’t.

My theory is that if you’re a pundit and you become famous, you attract lots and lots of stupid criticism. This happened to Paul Krugman, it happened to David Brooks, and it happened to Nate Silver too. You get lots of uninformed people attacking you because they don’t misunderstand what you’re doing or you were pooh-poohing their favorite JFK-assassination theory or UFO’s-as-space-aliens theory or you’re not taking their exact political position on the issue of the day or whatever. And so you develop . . . not a thick skin, exactly, but an acceptance that you have neither the time or interest in responding to all the uninformed and possibly insincere criticism that you’re receiving.

At some point you realize that you’re piloting a submarine through a poop-filled sea and you can’t spend the rest of your life trying to keep the hull clean.

At the same time, you’re getting some thoughtful criticism! Some of it is framed in a very deferential way to you, some of it is direct and polite, and some of it is downright rude but still thoughtful. But it doesn’t matter: you’ve already turned off your reply instinct. So even when you feel forced to reply (for example, person A criticizes something you said and then B, C, D, E, F, etc. join in the twitter thread and ask what is your response), you do so reluctantly, with annoyance, and you reiterate that you “don’t really want a back-and-forth.”

In summary, the “I don’t really want a back-and-forth” attitude makes me sad, but I think I understand where it is coming from. And if people don’t want that sort of discussion, that’s their choice.

“My quick answer is that I don’t care much about permutation tests because they are testing a null null hypothesis that is of typically no interest”

Posted on January 11, 2024 9:02 AM by Andrew

Riley DeHaan writes:

I’m a psych PhD student and I have a statistical question that’s been bothering me for some time and wondered if you’d have any thoughts you might be willing to share.

I’ve come across some papers employing z-scores of permutation null distributions as a primary metric in neuroscience (for an example, see here).

The authors computed a coefficient of interest in a multiple linear regression and then permuted the order of the predictors to obtain a permutation null distribution of that coefficient. “The true coefficient for functional connectivity is compared to the distribution of null coefficients to obtain a z-score and P-value.” The authors employed this permutation testing approach to avoid the need to model potentially complicated autocorrelations between the observations in their sample and then wanted a statistic that provided a measure of effect size rather than relying solely on p-values.

Is there any meaningful interpretation of a z-score of a permutation null distribution under the alternative hypothesis? Is this a commonly used approach? This approach would not appear to find meaningfully normalized estimates of effect size given the variability of the permutation null distribution may not have anything to do with the variance of the statistic of interest under its own distribution. In this case, I’m not sure a z-score based on the permutation null provides much information beyond significance. The variability of the permutation null distribution will also be a function of the sample size in this case. Could we argue that permutation null distributions would in many cases (I’m thinking about simple differences in means rather than regression coefficients) tend to overestimate the variability of the true statistic given permutation tests are conservative compared to tests based on known distributions of the statistic of interest? This z-score approach would then tend to produce conservative effect sizes. I’m not finding references to this approach online beyond this R package.

My reply: My quick answer is that I don’t care much about permutation tests because they are testing a null null hypothesis that is of typically no interest. Related thoughts are here.

P.S. If you, the reader of this blog, care about permutation tests, that’s fine! Permutation tests have a direct mathematical interpretation. They just don’t interest me, that’s all.

The appeal of New York Times columnist David Brooks . . . Yeah, I know this all sounds like a nutty “it’s wheels within wheels, man” sort of argument, but I’m serious here!

Posted on January 10, 2024 9:52 AM by Andrew

Over the years, we’ve written a bit about David Brooks on this blog, originally because he had interesting things to say about a topic I care about (Red State Blue State) and later because people pointed out to me various places where he made errors and then refused to correct them, something that bothered me for its own sake (correctable errors in the paper of record!) and as part of a larger phenomenon which I described as Never back down: The culture of poverty and the culture of journalism. At an intellectual level, I understand why pundits are motivated to not ever admit error, also I can see how they can get into the habit of shunting criticism aside because they get so much of it; still, I get annoyed.

Another question arises, though, which is how is it that Brooks has kept his job for so long? I had a recent discussion with Palko on this point.

The direct answer to why Brooks stays employed is that he’s a good writer, regularly turns in his columns on time, continues to write on relevant topics, and often has interesting ideas. Sure, he makes occasional mistakes, but (a) everyone makes mistakes, and when they appear in a newspaper with a circulation of millions, people will catch these mistakes, and (b) newspapers in general, and the Times in particular, are notorious for only very rarely running corrections, so Brooks making big mistakes and not correcting himself is not any kind of disqualification.

In addition, Palko wrote:

For the target audience [of the Times, Brooks offers] a nearly ideal message. It perfectly balances liberal guilt with a sense of class superiority.

I replied with skepticism of Palko’s argument that Brooks’s continued employment comes from his appeal to liberals.

I suspect that more of it is the opposite, that Brooks is popular among conservatives because he’s a conservative who conservatives think can appeal to liberals.

Kinda like the appeal of Michael Moore to liberals: Moore’s the sort of liberal who liberals think can appeal to conservatives.

I like this particular analogy partly because I imagine that it would piss off both Brooks and Moore (not that either of them will ever see this post).

Palko responded:

But it’s not conservatives who keep hiring him.

Brooks’ breakthrough was in the Atlantic, the primary foundation of his career is his long-time day job is with the NYT, his largest audience probably comes from PBS News Hour.

To which I replied as follows:

First off, I don’t know whether the people who are hiring Brooks are liberal, conservative, or somewhere in between. In any case, if they’re conservative, I’m pretty sure they’re only moderately so: I say this because I don’t think the NYT op-ed page has any columnists who supported the Jan 6 insurrection or who claim that Trump actually won the 2020 election etc.

It’s my impression that one reason Brooks was hired, in addition to his ability to turn in readable columns on time, was (a) he’s had some good ideas that have received a lot of attention (for example, the whole bobo stuff, his red-state, blue-state stuff), and (b) most of their op-ed columnists have been liberal or centrist, and they want some conservatives for balance.

Regarding (a), yes, he’s said a lot of dumb things, but I’d say he still has had some good ideas. He’s kinda like Gladwell in that he speculates with an inappropriate air of authority, but his confidence can sometimes get him to interesting places that a more careful writer might never reach.

Regarding (b), it’s relevant that many conservatives are fans of Brooks (for example here, here, and here). If the NYT is going to hire a conservative writer for balance, they’ll want to hire a conservative writer who conservatives like. Were they to hire a writer who conservatives hate, they wouldn’t be doing a good job of satisfying their goal of balance.

So, whoever is in charge of hiring Brooks and wherever his largest audience is, I think that a key to his continued employment is that he is popular among conservatives because he’s a conservative who conservatives think can appeal to liberals.

Yeah, I know this all sounds like a nutty “it’s wheels within wheels, man” sort of argument, but I’m serious here!

This post is political science

The point of posting this is not to talk more about Brooks—if you’re interested in him, you can read his column every week—but rather to consider some of these indirect relationships here, the idea that a publication with liberal columnists will hire a conservative who is then chosen in large part because conservatives see him as the sort of conservative who will appeal to liberals. I don’t think this happens so much in the opposite direction, because if a publication has lots of conservative columnists, that’s probably because it’s an explicitly conservative publication so they wouldn’t want to employ any liberals at all. There must be some counterexamples to that, though.

And I do think there’s some political science content here, related to this discussion I wrote with Gross and Shalizi, but I’ve struggled with how to address the topic more systematically.

The immediate victims of the con would rather act as if the con never happened. Instead, they’re mad at the outsiders who showed them that they were being fooled.

Posted on January 7, 2024 9:55 AM by Andrew

Dorothy Bishop has the story about “a chemistry lab in CNRS-Université Sorbonne Paris Nord”:

More than 20 scientific articles from the lab of one principal investigator have been shown to contain recycled and doctored graphs and electron microscopy images. That is, results from different experiments that should have distinctive results are illustrated by identical figures, with changes made to the axis legends by copying and pasting numbers on top of previous numbers. . . . the problematic data are well-documented in a number of PubPeer comments on the articles (see links in Appendix 1 of this document).

The response by CNRS [Centre National de la Recherche Scientifique] to this case . . . was to request correction rather than retraction of what were described as “shortcomings and errors”, to accept the scientist’s account that there was no intentionality, despite clear evidence of a remarkable amount of manipulation and reuse of figures; a disciplinary sanction of exclusion from duties was imposed for just one month.

I’m not surprised. The sorts of people who will cheat on their research are likely to be the same sorts of people who will instigate lawsuits, start media campaigns, and attack in other ways. These are researchers who’ve already shown a lack of scruple and a willingness to risk their careers; in short, they’re loose cannons, scary people, so it can seem like the safest strategy to not try to upset them too much, not trap them into a corner where they’ll fight like trapped rats. I’m not speaking specifically of this CNRS researcher—I know nothing of the facts of this case beyond what’s reported in Bishop’s post—I’m just speaking to the mindset of the academic administrators who would just like the problem to go away so they can get on with their regular jobs.

But Bishop and her colleagues were annoyed. If even blatant examples of scientific misconduct cannot be handled straightforwardly, what does this say about the academic and scientific process more generally? Is science just a form of social media, where people can make any sort of claim and evidence doesn’t matter?

They write:

So what should happen when fraud is suspected? We propose that there should be a prompt investigation, with all results transparently reported. Where there are serious errors in the scientific record, then the research articles should immediately be retracted, any research funding used for fraudulent research should be returned to the funder, and the person responsible for the fraud should not be allowed to run a research lab or supervise students. The whistleblower should be protected from repercussions.

In practice, this seldom happens. Instead, we typically see, as in this case, prolonged and secret investigations by institutions, journals and/or funders. There is a strong bias to minimize the severity of malpractice, and to recommend that published work be “corrected” rather than retracted.

Bishop and her colleagues continue:

One can see why this happens. First, all of those concerned are reluctant to believe that researchers are dishonest, and are more willing to assume that the concerns have been exaggerated. It is easy to dismiss whistleblowers as deluded, overzealous or jealous of another’s success. Second, there are concerns about reputational risk to an institution if accounts of fraudulent research are publicised. And third, there is a genuine risk of litigation from those who are accused of data manipulation. So in practice, research misconduct tends to be played down.

But:

This failure to act effectively has serious consequences:

1. It gives credibility to fictitious results, slowing down the progress of science by encouraging others to pursue false leads. . . . [and] erroneous data pollutes the databases on which we depend.

2. Where the research has potential for clinical or commercial application, there can be direct damage to patients or businesses.

3. It allows those who are prepared to cheat to compete with other scientists to gain positions of influence, and so perpetuate further misconduct, while damaging the prospects of honest scientists who obtain less striking results.

4. It is particularly destructive when data manipulation involves the Principal Investigator of a lab. . . . CNRS has a mission to support research training: it is hard to see how this can be achieved if trainees are placed in a lab where misconduct occurs.

5. It wastes public money from research grants.

6. It damages public trust in science and trust between scientists.

7. It damages the reputation of the institutions, funders, journals and publishers associated with the fraudulent work.

8. Whistleblowers, who should be praised by their institution for doing the right thing, are often made to feel that they are somehow letting the side down by drawing attention to something unpleasant. . . .

What happened next?

It’s the usual bad stuff. They receive a series of stuffy bureaucratic responses, none of which address any of items 1 through 8 above, let alone the problem of the data which apparently have obviously been faked. Just disgusting.

But I’m not surprised. We’ve seen it many times before:

– The University of California’s unresponsive response when informed of research misconduct by their star sleep expert.

– The American Political Science Association refusing to retract an award given to an author for a book with plagiarized material, or even to retroactively have the award shared with the people whose material was copied without acknowledgment.

– The London Times never acknowledging the blatant and repeated plagiarism by its celebrity chess columnist.

– The American Statistical Association refusing to retract an award given to a professor who plagiarized multiple times, including from wikipedia (in an amusing case where he created negative value by introducing an error into the material he’d copied, so damn lazy that he couldn’t even be bothered to proofread his pasted material).

– Cornell University . . . ok they finally canned the pizzagate dude, but only after emitting some platitudes. Kind of amazing that they actually moved on that one.

– The Association for Psychological Science: this one’s personal for me, as they ran an article that flat-out lied about me and then refused to correct it just because, hey, they didn’t want to.

– Lots and lots of examples of people finding errors or fraud in published papers and journals refusing to run retractions or corrections or even to publish letters pointing out what went wrong.

Anyway, this is one more story.

What gets my goat

What really annoys me in these situations is how the institutions show loyalty to the people who did research misconduct. When researcher X works at or publishes with institution Y, and it turns out that X did something wrong, why does Y so often try to bury the problem and attack the messenger? Y should be mad at X; after all, it’s X who has leveraged the reputation of Y for his personal gain. I’d think that the leaders of Y would be really angry at X, even angrier than people from the outside. But it doesn’t happen that way. The immediate victims of the con would rather act as if the con never happened. Instead, they’re mad at the outsiders who showed them that they were being fooled. I’m sure that Dan Davies would have something to say about all this.

Since Jeffrey Epstein is in the news again . . .

Posted on January 5, 2024 12:50 PM by Andrew

I came across this from a bit more than a year ago which is also relevant to today’s earlier post on “The Simple Nudge That Raised Median Donations by 80%.” Here it is:

Nudge meets Edge: A Boxing Day Story

I happened to come across this post from 2011 about an article from one of the Nudgelords promoting the ridiculous “traditional” idea of modeling risk aversion as “a primitive; each person had a parameter, gamma, that measured her degree of risk aversion.” That was before I had a full sense of how silly/dangerous the whole nudge thing was (see also here) . . . but, also, it featured a link to the notorious Edge foundation, home of Jeffrey Epstein and his pals. All those Great Men; there’s hardly enough room at NPR and Ted to hold all of them.

Again, why am I picking on these guys? The Edge foundation: are these not the deadest of dead horses? But remember what they say about beating a dead horse. The larger issue—a smug pseudo-humanistic contempt for scientific measurement, along with an attitude that money + fame = truth—that’s still out there.

What to trust in the newspaper? Example of “The Simple Nudge That Raised Median Donations by 80%”

Posted on January 5, 2024 9:12 AM by Andrew

Greg Mayer points to this news article, “The Simple Nudge That Raised Median Donations by 80%,” which states:

A start-up used the Hebrew word “chai” and its numerical match, 18, to bump up giving amounts. . . . It’s a common donation amount among Jews — $18, $180, $1,800 or even $36 and other multiples.

So Daffy lowered its minimum gift to $18 and then went further, prompting any donor giving to any Jewish charity to bump gifts up by some related amount. Within a year, median gifts had risen to $180 from $100. . . .

I see several warning signs here:

1. “Within a year, median gifts had risen to $180 from $100.” This is a before/after change, not a direct comparison of outcomes.

2. No report, just a quoted number which could easily have been made up. Yes, the numbers in a report can be fabricated too, but that takes more work and is more risk. Making up numbers when talking with a reporter, that’s easy.

3. The people who report the number are motivated to claim success; the reporter is motivated to report a success. The article is filled with promotion for this company. It’s a short article that mentions “Daffy” 6 times in the short article, for example this bit which reads like a straight-up ad:

If you have children, grandchildren, nieces or nephews, there’s another possibility. Daffy has a family plan that allows children to prompt their adult relatives to support a cause the children choose. Why not put the app on their iPhones or iPads so they can make suggestions and let, for example, a 12-year-old make $12 donations to 12 nonprofits each year?

Why not, indeed? Even better, why not have them make their donations directly to Daffy and cut out the middleman?? Look, I’m not saying that the people behind Daffy are doing anything wrong; it’s just that this is public relations, not journalism.

4. Use of the word “nudge” in the headline is consistent with business-press hype. Recall that “nudge” is a subfield whose proponents are well connected in the media and routinely make exaggerated claims.

So, yeah, an observational comparison with no documentation, in an article that’s more like an advertisement, that’s kinda sus. Not that the claim is definitely wrong, there’s just no good reason for us to take it seriously.

Progress in 2023

Posted on January 4, 2024 9:28 AM by Andrew

Published:

[2023]. Bayesian spatial modelling of localised SARS-CoV-2 transmission through mobility networks across England. {\em PLoS Computational Biology} 19, e1011580.
(Thomas Ward, Mitzi Morris, Andrew Gelman, Bob Carpenter, William Ferguson, Christopher Overton, and Martyn Fylesn)

[2023] Generically partisan: Polarization in political communication. {\em Proceedings of the National Academy of Sciences} 120, e2309361120.
(Gustavo Novoa, Margaret Echelbarger, Andrew Gelman, and Susan Gelman)
Supplementary appendix.

[2023] Simulation-based calibration checking for Bayesian computation: The choice of test quantities shapes sensitivity. {\em Bayesian Analysis}.
(Martin Modrák, Angie H. Moon, Shinyoung Kim, Paul Bürkner, Niko Huurre, Kateřina Faltejsková, Andrew Gelman, and Aki Vehtari)

[2023] Causal quartets: Different ways to attain the same average treatment effect. {\em American Statistician}.
(Andrew Gelman, Jessica Hullman, and Lauren Kennedy)

[2023] In pursuit of campus-wide data literacy: A guide to developing a statistics course for students in non-quantitative fields. {\em Journal of Statistics and Data Science Education}.
(Alexis Lerner and Andrew Gelman)

[2023] A new look at p-values for randomized clinical trials. {\em NEJM Evidence}.
(Erik van Zwet, Andrew Gelman, Sander Greenland, Guido Imbens, Simon Schwab, and Steven N. Goodman)

[2023] Past, present, and future of software for Bayesian inference. {\em Statistical Science}.
(Erik Štrumbelj, Alexandre Bouchard-Côté, Jukka Corander, Andrew Gelman, Håvard Rue, Lawrence Murray, Henri Pesonen, Martyn Plummer, and Aki Vehtari)

[2023] Challenges in adjusting a survey that overrepresents people interested in politics. {\em Harvard Data Science Review} {\bf 5} (3).
(Andrew Gelman and Gustavo Novoa)

[2023] Using leave-one-out cross-validation (LOO) in a multilevel regression and poststratification (MRP) workflow: A cautionary tale. {\em Statistics in Medicine}.
(Swen Kuh, Lauren Kennedy, Qixuan Chen, and Andrew Gelman)

[2023] What is a standard error? {\em Journal of Econometrics} {\bf 237}, 105516.
(Andrew Gelman)

[2023] Who wants school vouchers in America? A comprehensive study using multilevel regression and poststratification. {\em Social Sciences} {\bf 12} (8), 430.
(Yu-Sung Su and Andrew Gelman)

[2023] A chain as strong as its strongest link? Understanding the causes and consequences of biases arising from selective analysis and reporting of research results. {\em Journal of Research on Educational Effectiveness}.
(Andrew Gelman)

[2023] Before data analysis: Additional recommendations for designing experiments to learn about the world. {\em Journal of Consumer Psychology}.
(Andrew Gelman)

[2023] Toward a taxonomy of trust for probabilistic machine learning. {\em Science Advances} {\bf 9}, eabn3999.
(Tamara Broderick, Andrew Gelman, Rachael Meager, Anna L. Smith, and Tian Zheng)

[2023] Federated learning as variational inference: A scalable expectation propagation approach. {\em International Conference on Learning Representations (ICLR)}.
(Han Guo, Philip Greengard, Hongyi Wang, Andrew Gelman, Yoon Kim, and Eric P. Xing)

[2023] I love this paper but it’s barely been noticed. Part of a collaborative article, “What are your most underappreciated works?” {\em Econ Journal Watch} {\bf 20}, 466.
(Andrew Gelman)

[2023] From visualization to sensification. {\em Amstat News} 547, 18–19.
(Andrew Gelman and S. Gwynn Sturdevant)

[2023] Fast methods for posterior inference of two-group normal-normal models. {\em Bayesian Analysis}.
(Philip Greengard, Jeremy Hoskins, Charles C. Margossian, Jonah Gabry, Andrew Gelman, and Aki Vehtari)

[2023] “Two truths and a lie” as a class-participation activity. {\em American Statistician} {\bf 77}, 97–101.
(Andrew Gelman)

Unpublished:

Regression, poststratification, and small-area estimation with sampling weights.
(Andrew Gelman)

Understanding posterior recalibration for a simple example.
(Andrew Gelman, Julie Gershunskaya, Terrance Savitsky, and Ben Goodrich)

Bayesian workflow for time-varying transmission in stratified compartmental infectious disease transmission models.
(Judith A. Bouman, Anthony Hauser, Simon L. Grimm, Martin Wohlfender, Samir Bhatt, Elizaveta Semenova, Andrew Gelman, Christian L. Althaus, and Julien Riou)

Artificial intelligence and aesthetic judgment.
(Jessica Hullman, Ari Holtzman, and Andrew Gelman)

The ladder of abstraction in statistical graphics.
(Andrew Gelman)

BISG: When inferring race or ethnicity, does it matter that people often live near their relatives?
(Philip Greengard and Andrew Gelman)

Enjoy.

Clarke’s Law, and who’s to blame for bad science reporting

Posted on January 2, 2024 9:22 AM by Andrew

Lizzie blamed the news media for a horrible bit of news reporting on the ridiculous claim that “the climate crisis is causing certain grapes, used in almost all champagne, to be on the brink of extinction.” The press got conned by a press release from a sleazy company, which in this case was “a Silicon Valley startup” but in other settings could be a pollster or a car company or a university public relations office or an advocacy group or some other institution that has a quasi-official role in our society.

Lizzie was rightly ticked off by the media organizations that were happily playing the “sucker” role in this drama, with CNN straight-up going with the press release, along with a fawning treatment of the company that was pushing the story, and NPR going with a mildly skeptical amused tone, interviewing an actual outside expert but still making the mistake of taking the story seriously rather than framing it as a marketing exercise.

We’ve seen this sort of credulous reporting before, perhaps most notably with Theranos and the hyperloop. It’s not just that the news media are suckers, it’s that being a sucker—being credulous—is in many cases a positive for a journalist. A skeptical reporter will run fewer stories, right? Malcolm Gladwell and the Freakonomics team are superstars, in part because they’re willing to routinely turn off whatever b.s. detectors they might have, in order to tell good stories. They get rewarded for their practice of promoting unfounded claims. If we were to imagine an agent-based model of the news media, these are the agents that flow to the top. One could suppose a different model, in which mistakes tank your reputation, but that doesn’t seem to be the world in which we operate.

So, yeah, let’s get mad at the media, first for this bogus champagne story and second for using this as an excuse to promote a bogus company.

Also . . .

Let’s get mad at the institutions of academic science, which for years have been unapologetically promoting crap like himmicanes, air rage, ages ending in 9, nudges, and, let’s never forget, the lucky golf ball.

In terms of wasting money and resources, I don’t think any of those are as consequential as business scams such as Theranos or hyperloop; rather, they bother me because they’re coming from academic science, which might traditionally be considered a more trustworthy source.

And this brings us to Clarke’s law, which you may recall is the principle that any sufficiently crappy research is indistinguishable from fraud.

How does that apply here? I can only assume that the researchers behind the studies of himmicanes, air rage, ages ending in 9, nudges, the lucky golf ball, and all the rest, are sincere and really believe that their claims are supported by their data. But there have been lots of failed replications, along with methodological and statistical explanations of what went wrong in those studies. At some point, to continue to promote them is, in my opinion, on the border of fraud: it requires willfully looking away from contrary evidence and, at the extreme, leads to puffed-up-rooster claims such as, “The replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.”

In short, the corruption involved in the promotion of academic science has poisoned the well and facilitated the continuing corruption of the news media by business hype.

I’m not saying that business hype and media failure are the fault of academic scientists. Companies would be promoting themselves, and these lazy news organizations would be running glorified press releases, no matter what we were to do in academia. Nor, for that matter, are academics responsible for credulity on stories such as UFO space aliens. The elite news media seems to be able to do this all on its own.

I just don’t think that academic science hype is helping with the situation. Academic science hype helps to set up the credulous atmosphere.

Michael Joyner made a similar point a few years ago:

Why was the Theranos pitch so believable in the first place? . . .

Who can forget when James Watson. . . . co-discoverer of the DNA double helix, made a prediction in 1998 to the New York Times that so-called VEGF inhibitors would cure cancer in “two years”?

At the announcement of the White House Human Genome Project in June 2000, both President Bill Clinton and biotechnologist Craig Venter predicted that cancer would be vanquished in a generation or two. . . .

That was followed in 2005 by the head of the National Cancer Institute, Andrew von Eschenbach, predicting the end of “suffering and death” from cancer by 2015, based on a buzzword bingo combination of genomics, informatics, and targeted therapy.

Verily, the life sciences arm of Google, generated a promotional video that has, shall we say, some interesting parallels to the 2014 TedMed talk given by Elizabeth Holmes. And just a few days ago, a report in the New York Times on the continuing medical records mess in the U.S. suggested that with better data mining of more coherent medical records, new “cures” for cancer would emerge. . . .

So, why was the story of Theranos so believable in the first place? In addition to the specific mix of greed, bad corporate governance, and too much “next” Steve Jobs, Theranos thrived in a biomedical innovation world that has become prisoner to a seemingly endless supply of hype.

Joyner also noted that science hype was following patterns of tech hype. For example, this from Dr. Eric Topol, director of the Scripps Translational Science Institute:

When Theranos tells the story about what the technology is, that will be a welcome thing in the medical community. . . . I tend to believe that Theranos is a threat.

The Scripps Translational Science Institute is an academic, or at least quasi-academic, institution! But they’re using tech hype disrupter terminology by calling scam company Theranos a “threat” to the existing order. I have no reason to think that the director of the Scripps Translational Science Institute himself committing fraud? I have no reason to think so. What I do think is that he wants to have it both ways. When Theranos was riding high, he hyped it and called it a “threat” (again, that’s a positive adjective in this context). Later, after the house of cards fell, he wrote, “I met Holmes twice and conducted a video interview with her in 2013. . . . Like so many others, I had confirmation bias, wanting this young, ambitious woman with a great idea to succeed. The following year, in an interview with The New Yorker, I expressed my deep concern about the lack of any Theranos transparency or peer-reviewed research.” Actually, though, here’s what he said to the New Yorker: “I tend to believe that Theranos is a threat. But if I saw data in a journal, head to head, I would feel a lot more comfortable.” Sounds to me less like deep concern and more like hedging his bets.

Caught like a deer in the headlights between skepticism and fomo.

Extinct Champagne grapes? I can be even more disappointed in the news media

Posted on January 1, 2024 5:51 PM by Lizzie

Happy New Year. This post is by Lizzie.

Over the end-of-year holiday period, I always get the distinct impression that most journalists are on holiday too. I felt this more acutely when I found an “urgent” media request in my inbox when I returned to it after a few days away. Someone at a major reputable news outlet wrote:

We are doing a short story on how the climate crisis is causing certain grapes, used in almost all champagne, to be on the brink of extinction. We were hoping to do a quick interview with you on the topic….Our deadline is asap, as we plan to run this story on New Years.

It was late on 30 December so I had missed helping them but still had to reply that I hoped that found some better information because ‘the climate crisis is causing certain grapes, used in almost all champagne, to be on the brink of extinction’ was not good information in my not-so-entirely-humble opinion as I study this and can think of zero-zilch-nada evidence to support this.

This sounded like insane news I would expect from more insane media outlets. I tracked down what I assume was the lead they were following (see here), and found it seems to relate to some AI start-up I will not do the service of mentioning that is just looking for more press. They seem to put out splashy sounding agricultural press releases often — and so they must have put out one about Champagne grapes being on the brink of extinction to go with New Year’s.

I am on a bad roll with AI just now, or — more exactly — the intersection of human standards and AI. There’s no good science that “the climate crisis is causing certain grapes, used in almost all champagne, to be on the brink of extinction.” The whole idea of this is offensive to me when human actions are actually driving species extinct. And it ignores tons of science on winegrapes and the reality that they’re pretty easy to grow (growing excellent ones? Harder). So, poor form on the part of the zero-standards-for-our-science AI startup. But I am more horrified by the media outlets that cannot see through this. I am sure they’re inundated with lots of crazy bogus stories every day, but I thought that their job was to report on ones that matter and they hopefully have some evidence are true.

What did they do instead of that? They gave a platform to a “a highly adaptable marketing manager and content creator” to talk about some bogus “study” and a few soundbites to a colleague of mine who actually knew the science (Ben Cook from NASA).

Here’s a sad post for you to start the new year. The Onion (ok, an Onion-affiliate site) is plagiarizing. For reals.

Posted on January 1, 2024 9:00 AM by Andrew

How horrible. I remember when The Onion started. They were so funny and on point. And now . . . What’s the point of even having The Onion if it’s running plagiarized material? I mean, yeah, sure, everybody’s gotta bring home money to put food on the table. But, really, what’s the goddam point of it all?

Jonathan Bailey has the story:

Back in June, G/O Media, the company that owns A.V. Club, Gizmodo, Quarts and The Onion, announced that they would be experimenting with AI tools as a way to supplement the work of human reporters and editors.

However, just a week later, it was clear that the move wasn’t going smoothly. . . . several months later, it doesn’t appear that things have improved. If anything, they might have gotten worse.

The reason is highlighted in a report by Frank Landymore and Jon Christian at Futurism. They compared the output of A.V. Club’s AI “reporter” against the source material, namely IMDB. What they found were examples of verbatim and near-verbatim copying of that material, without any indication that the text was copied. . . .

The articles in question have a note that reads as follows: “This article is based on data from IMDb. Text was compiled by an AI engine that was then reviewed and edited by the editorial staff.”

However, as noted by the Futurism report, that text does not indicate that any text is copied. Only that “data” is used. The text is supposed to be “compiled” by the AI and then “reviewed and edited” by humans. . . .

In both A.V. Club lists, there is no additional text or framing beyond the movies and the descriptions, which are all based on IMDb descriptions and, as seen in this case, sometimes copied directly or nearly directly from them.

There’s not much doubt that this is plagiarism. Though A.V. Club acknowledges that the “data” came from IMDb, it doesn’t indicate that the language does. There are no quotation marks, no blockquotes, nothing to indicate that portions are copied verbatim or near-verbatim. . . .

Bailey continues:

None of this is a secret. All of this is well known, well-understood and backed up with both hard data and mountains of anecdotal evidence. . . . But we’ve seen this before. Benny Johnson, for example, is an irredeemably unethical reporter with a history of plagiarism, fabrication and other ethical issues that resulted in him being fired from multiple publications.

Yet, he’s never been left wanting for a job. Publications know that, because of his name, he will draw clicks and engagement. . . . From a business perspective, AI is not very different from Benny Johnson. Though the flaws and integrity issues are well known, the allure of a free reporter who can generate countless articles at the push of a button is simply too great to ignore.

Then comes the economic argument:

But in there lies the problem, if you want AI to function like an actual reporter, it has to be edited, fact checked and plagiarism checked just like a real human.

However, when one does those checks, the errors quickly become apparent and fixing them often takes more time and resources than just starting with a human author.

In short, using an AI in a way that helps a company earn/save money means accepting that the factual errors and plagiarism are just part of the deal. It means completely forgoing journalism ethics, just like hiring a reporter like Benny Johnson.

Right now, for a publication, there is no ethical use of AI that is not either unprofitable or extremely limited. These “experiments” in AI are not about testing what the bots can do, but about seeing how much they can still lower their ethical and quality standards and still find an audience.

Ouch.

Very sad to see an Onion-affiliated site doing this.

Here’s how Bailey concludes:

The arc of history has been pulling publications toward larger quantities of lower quality content for some time. AI is just the latest escalation in that trend, and one that publishers are unlikely to ignore.

Even if it destroys their credibility.

No kidding. What next, mathematics professors who copy stories unacknowledged, introduce errors, and then deny they ever did it? Award-winning statistics professors who copy stuff from wikipedia, introducing stupid-ass errors in the process? University presidents? OK, none of those cases were shocking, they’re just sad. But to see The Onion involved . . . that truly is a step further into the abyss.

Statistical Modeling, Causal Inference, and Social Science

Category Archives: Zombies