If school funding doesn’t really matter, why do people want their kid’s school to be well funded?

A question came up about the effects of school funding and student performance, and we were referred to this review article from a few years ago by Larry Hedges, Terri Pigott, Joshua Polanin, Ann Marie Ryan, Charles Tocci, and Ryan Williams:

One question posed continually over the past century of education research is to what extent school resources affect student outcomes. From the turn of the century to the present, a diverse set of actors, including politicians, physicians, and researchers from a number of disciplines, have studied whether and how money that is provided for schools translates into increased student achievement. The authors discuss the historical origins of the question of whether school resources relate to student achievement, and report the results of a meta-analysis of studies examining that relationship. They find that policymakers, researchers, and other stakeholders have addressed this question using diverse strategies. The way the question is asked, and the methods used to answer it, is shaped by history, as well by the scholarly, social, and political concerns of any given time. The diversity of methods has resulted in a body of literature too diverse and too inconsistent to yield reliable inferences through meta-analysis. The authors suggest that a collaborative approach addressing the question from a variety of disciplinary and practice perspectives may lead to more effective interventions to meet the needs of all students.

I haven’t followed this literature carefully. It was my vague impression that studies have found effects of schools on students’ test scores to be small. So, not clear that improving schools will do very much. On the other hand, everyone wants their kid to go to a good school. Just for example, all the people who go around saying that school funding doesn’t matter, they don’t ask to reduce the funding of their own kids’ schools. And I teach at an expensive school myself. So lots of pieces here, hard for me to put together.

I asked education statistics expert Beth Tipton what she thought, and she wrote:

I think the effect of money depends upon the educational context. For example, in higher education at selective universities, the selection process itself is what ensures success of students – the school matters far less. But in K-12, and particularly in under resourced areas, schools and finances can matter a lot – thus the focus on charter schools in urban locales.

I guess the problem here is that I’m acting like the typical uninformed consumer of research. The world is complicated, and any literature will be a mess, full of claims and counter-claims, but here I am expecting there to be a simple coherent story that I can summarize in a short sentence (“Schools matter” or “Schools don’t matter” or, maybe, “Schools matter but only a little”).

Given how frustrated I get when others come into a topic with this attitude, I guess it’s good for me to recognize when I do it.

Social penumbras predict political attitudes (my talk at Harvard on Monday Feb 12 at noon)

Monday, February 12, 2024, 12:00pm to 1:15pm

Social penumbras predict political attitudes

The political influence of a group is typically explained in terms of its size, geographic concentration, or the wealth and power of the group’s members. This article introduces another dimension, the penumbra, defined as the set of individuals in the population who are personally familiar with someone in that group. Distinct from the concept of an individual’s social network, penumbra refers to the circle of close contacts and acquaintances of a given social group. Using original panel data, the article provides a systematic study of various groups’ penumbras, focusing on politically relevant characteristics of the penumbras (e.g., size, geographic concentration, sociodemographics). Furthermore, we show the connection between changes in penumbra membership and public attitudes on policies related to the group.

This is based on a paper with Yotam Margalit from 2021.

“Replicability & Generalisability”: Applying a discount factor to cost-effectiveness estimates.

This one’s important.

Matt Lerner points us to this report by Rosie Bettle, Replicability & Generalisability: A Guide to CEA discounts.

“CEA” is cost-effectiveness analysis, and by “discounts” they mean what we’ve called the Edlin factor—“discount” is a better name than factor, because it’s a number that should be between 0 and 1, it’s what you should multiply a point estimate by to adjust for inevitable upward biases in reported effect-size estimates, issues discussed here and here, for example.

It’s pleasant to see some of my ideas being used for a practical purpose. I would just add that type M and type S errors should be lower for Bayesian inferences than for raw inferences that have not been partially pooled toward a reasonable prior model.

Also, regarding empirical estimation of adjustment factors, I recommend looking at the work of Erik van Zwet et al; here are some links:
What’s a good default prior for regression coefficients? A default Edlin factor of 1/2?
How large is the underlying coefficient? An application of the Edlin factor to that claim that “Cash Aid to Poor Mothers Increases Brain Activity in Babies”
The Shrinkage Trilogy: How to be Bayesian when analyzing simple experiments
Erik van Zwet explains the Shrinkage Trilogy
The significance filter, the winner’s curse and the need to shrink
Bayesians moving from defense to offense: “I really think it’s kind of irresponsible now not to use the information from all those thousands of medical trials that came before. Is that very radical?”
Explaining that line, “Bayesians moving from defense to offense”

I’m excited about the application of these ideas to policy analysis.

A new argument for estimating the probability that your vote will be decisive

Toby Ord writes:

I think you will like this short proof that puts a lower bound on the probability that one’s vote is decisive.

It requires just one assumption (that the probability distribution over vote share is unimodal) and takes two inputs (the number of voters & the probability the underdog wins). It shows that in (single level) elections that aren’t forgone conclusions, the chance your vote is decisive can’t be much lower than 1 in the number of voters (and I show where some models that say otherwise go wrong).

Among other things, this makes it quite plausible that the moral value of voting is positive in expectation, since the aggregate value scales with n, while the probability scales with 1/n. Voting would produce net-value roughly when the value of your preferred candidate to the average citizen exceeds the cost to you of voting.

This relates to my paper with Edlin and Kaplan, “Voting as a rational choice: why and how people vote to improve the well-being of others.”

I was happy to see that Ord’s article mentioned the point made in the appendix of our 2004 paper, as it addresses a question that often arises, which is whether a vote can never be decisive because when an election is close there can be a recount.

Also, some background: it’s my impression that the p = 10^-90 crowd (that is, the people who assign ridiculously small probabilities of a single vote being decisive) are typically not big fans of the idea of democracy, so it is convenient for them to suppose that voting doesn’t matter.

I’m not saying the p = 10^-90 people are cynical, as they may sincerely believe that democracy is overrated, and then this is compounded by innumeracy. Probability is difficult!

And then there’s just the general issue that people seem to have the expectation that, when there’s any sort of debate, that all the arguments must necessarily go in their favor, so they’ll twist and turn a million ways to avoid grappling with contrary arguments; see for example this discussion thread where I tried to clarify a point but it didn’t work.

Regarding this last point, Ord writes:

I hope it is useful to have a simple formula for a completely safe lower bound for the chance a vote is decisive. Not the same as your empirically grounded versions, but nice to show people who don’t trust the data or the more complex statistical analysis.

Hey, I got tagged by RetractoBot!

A message came in my inbox from “The RetractoBot Team, University of Oxford,” with subject line, “RetractoBot: You cited a retracted paper”:

That’s funny! When we cited that paper by Lacour and Green, we already knew it was no good. Indeed, that’s why we cited it. Here’s the relevant paragraph from our article:

In political science, the term “replication” has traditionally been applied to the simple act of reproducing a published result using the identical data and code as used in the original analysis. Anyone who works with real data will realize that this exercise is valuable and can catch problems with sloppy data analysis (e.g., the Excel error of Reinhart and Rogoff 2010, or the “gremlins” article of Tol 2009, which required nearly as many corrections as the number of points in its dataset; see Gelman 2014). Reexamination of raw data can also expose mistakes, such as the survey data of LaCour and Green (2014); see Gelman (2015).

We also cited two other notorious papers, Reinhart and Rogoff (2010) and Tol (2009), both of which should have been retracted but are still out there in the literature. According to Google scholar, Reinhard and Rogoff (2010) has been cited more than 5000 times! I guess that many of these citations are from articles such as mine, using it as an example of poor workflow, but still. Meanwhile, Tol (2009) has been cited over 1500 times. It does have a “correction and update” from 2014, but that hardly covers its many errors and inconsistencies.

Anyway, I can’t blame RetractoBot for not noticing the sense of my citation; it’s just funny how they sent that message.

Sympathy for the Nudgelords: Vermeule endorsing stupid and dangerous election-fraud claims and Levitt promoting climate change denial are like cool dudes in the 60s wearing Che T-shirts and thinking Chairman Mao was cool—we think they’re playing with fire, they think they’re cute contrarians pointing out contradictions in the system. For a certain kind of person, it’s fun to be a rogue.

A few months ago I wrote about some disturbing stuff I’d been hearing about from Harvard Law School professors Cass Sunstein and Adrian Vermeuele. The two of them wrote an article back in 2005 writing, “a refusal to impose [the death] penalty condemns numerous innocent people to death. . . . a serious commitment to the sanctity of human life may well compel, rather than forbid that form of punishment. . . .”

My own view is that the death penalty makes sense in some settings and not others. To say that “a serious commitment to the sanctity of human life may well compel” the death penalty . . . jeez, I dunno, that’s some real Inquisition-level thinking going on. Not just supporting capital punishment, they’re compelling it. That’s a real edgelord attitude, kinda like the thought-provoking professor in your freshman ethics class who argues that companies have not just the right but the moral responsibility to pollute the maximum amount possible under the law because otherwise they’re ducking their fiduciary responsibility to the shareholders. Indeed, it’s arguably immoral to not pollute beyond the limits of the law if the expected gain from polluting is lower than the expected loss from getting caught and fined.

Sunstein and Vermeule also recommended that the government should fight conspiracy theories by engaging in “cognitive infiltration of extremist groups,” which seemed pretty rich, considering that Vermeule spent his online leisure hours after the 2020 election promoting election conspiracy theories. Talk about the fox guarding the henhouse. This is one guy I would not trust to be in charge of government efforts to cognitively infiltrate extremist groups!

Meanwhile, these guys go on NPR, they’ve held appointive positions with the U.S. government, they’re buddies with elite legal academics . . . it bothers me! I’m not saying their free speech should be suppressed—we got some Marxists running around in this country too—I just don’t want them anywhere near the levers of power.

Anyway, I heard by email from someone who knows Sunstein and Vermeuele. It seems that both of them are nice guys, and when they stick to legal work and stay away from social science or politics they’re excellent scholars. My correspondent also wrote:

And on that 2019 Stasi tweet. Yes, it was totally out of line. You and others were right to denounce it. But I think it’s worth pointing out that he deleted the tweet the very same day (less than nine hours later), apologized for it as an ill-conceived attempt at humor, and noted with regret that the tweet came across as “unkind and harsh to good people doing good and important work.” I might gently and respectfully suggest that continuing to bring up this tweet four years later, after such a prompt retraction—which was coupled with an acknowledgement of the value of the work that you and others are doing in focusing on the need for scrutiny and replication of eye-catching findings—might be perceived as just a tad ungracious, even by those who believe that Cass was entirely in the wrong and you were entirely in the right as regards the original tweet. To paraphrase one of the great capital defense lawyers (who obviously said this in a much more serious context), all of us are better than our worst moment.

I replied:

– Regarding the disjunction between Vermeule’s scholarly competence and nice-guyness, on one hand, and his extreme political views: I can offer a statistical or population perspective. Think of a Venn diagram where the two circles are “reasonable person” and “extreme political views and actions.” (I’m adding “actions” here to recognize that the issue is not just that Vermeule thinks that a fascist takeover would be cool, but that he’s willing to sell out his intellectual integrity for it, in the sense of endorsing ridiculous claims.)

From an ethical point of view, there’s an argument in favor of selling out one’s intellectual integrity for political goals. One can make this argument for Vermeule or also for, say, Ted Cruz. The argument is that the larger goal (a fascist government in the U.S., or more power for Ted Cruz) is important enough that it’s worth making such a sacrifice. Or, to take slightly lesser examples, the argument would be that when Hillary Clinton lied about her plane being shot at, or when Donald Trump lied about . . . ok, just about everything, that they were thinking about larger goals. Indeed, one could argue that for Cruz and the other politicians, it’s not such a big deal—nobody expects politicians to believe half of what they’re saying anyway—but for Vermeule to trash his reputation in this way, that shows real commitment!

Actually, I’m guessing that Vermeule was just spending too much time online in a political bubble, and he didn’t really think that endorsing these stupid voter-fraud claims meant anything. To put it another way, you and I think that endorsing unsubstantiated claims of voting fraud is bad for two reasons: (1) intellectually it’s dishonest to claim evidence for X when you have no evidence for X, (2) this sort of thing is dangerous in the short term by supplying support to traitors, and (3) it’s dangerous in the long term by degrading the democratic process. But, for Vermeule, #2 and #3 might well be a plus not a minus, and, as for #1, I think it’s not uncommon for people to make a division between their professional and non-professional statements, and to have a higher standard for the former than the latter. Vermeule might well think, “Hey, that’s just twitter, it’s not real.” Similarly, the economist Steven Levitt and his colleagues wrote all sorts of stupid things (along with many smart things) under the Freakonomics banner, thinks which I guess (or, should I say, hope) he’d never have done in his capacity as an academic. Just to be clear, I’m not saying that everyone does this, indeed I don’t think I do it—I stand by what I blog, just as I stand by my articles and books—but I don’t think everyone does. Another example that’s kinda famous is biologists who don’t believe in evolution. They can just separate the different parts of their belief systems.

Anyway, back to the Venn diagram. The point is that something like 30% of Americans believe this election fraud crap. 30% of Americans won’t translate into 30% of competent and nice-guy law professors, but it won’t be zero, either. Even if it’s only 10% or less in that Venn overlap, it won’t be zero. And the people inside that overlap will get attention. And some of them like the attention! So at that point you can get people going further and further off the deep end.

If it would help, you could think of this as a 2-dimensional scatterplot rather than a Venn diagram, and in this case you can picture the points drifting off to the extreme over time.

To look at this another way, consider various well-respected people in U.S. and Britain who were communists in the 1930s through 1950s. Some of these people were scientists! And they said lots of stupid things. From a political perspective, that’s all understandable: even if they didn’t personally want to tear up families, murder political opponents, start wars, etc., they could make the case that Stalin’s USSR was a counterweight to fascism elsewhere. But from an intellectual perspective, they wouldn’t always make that sort of minimalist case. Some of them were real Soviet cheerleaders. Again, who knows what moral calculations they were making in their heads.

I’m not gonna go all Sunstein-level contrarian and argue that selling out one’s intellectual integrity is the ultimate moral sacrifice—I’m picturing a cartoon where Vermeule is Abraham, his reputation is Isaac, and the Lord is thundering above, booming down at him to just do it already—but I guess the case could be made, indeed maybe will be the subject of one of the 8 books that Sunstein comes out with next year and is respectfully reviewed on NPR etc.

– Regarding the capital punishment article: I have three problems here. The first is their uncritical acceptance of a pretty dramatic claim. In Sunstein and Vermeule’s defense, though, back in 2005 it was standard in social science for people to think that statistical significance + identification strategy + SSRN or NBER = discovery. Indeed, I’d guess that most academic economists still think that way! So to chide them on their innumeracy here would be a bit . . . anachronistic, I guess. The second problem is that I’m guessing they were so eager to accept this finding is that it allowed them to make this cool point that they wanted to make. If they’d said, “Here’s a claim, maybe it’s iffy but if it’s true, it has some interesting ethical implications…”, that would be one thing. But that’s not what I read their paper as saying. By saying “Recent evidence suggests that capital punishment may have a significant deterrent effect” and not considering the opposite, they’re making the fallacy of the one-way bet. My third problem is that I think their argument is crap, even setting aside the statistical study. I discussed this a bit in my post. There are two big issues they’re ignoring. The first is that if each execution saves 18 lives, then maybe we should start executing innocent people! Or, hey, we can find some guilty people to execute, maybe some second-degree murderers, armed robbers, arsonists, tax evaders, speeders, jaywalkers, . . . . shouldn’t be too hard to find some more targets–after all, they used to have the death penalty for forgery. Just execute a few hundred of them and consider how many lives will be saved. That may sound silly to you, but it’s Sunstein and Vermeule, not me, who wrote that bit about “a serious commitment to the sanctity of human life.” I discussed the challenges here in more detail in a 2006 post; see the section, “The death penalty as a decision-analysis problem?” My point is not that they have to agree with me, just that it’s not a good sign that their long-ass law article with its thundering about “the sanctity of human life” is more shallow than two paragraphs of a blog post.

In summary regarding the death-penalty article, I’m not slamming them for falling for crappy research (that’s what social scientists and journalists did back in 2005, and lots of them still to do this day) and I’m not slamming them for supporting death penalty (I’ve supported it too, at various times in my life; more generally I think it depends on the situation and that the death penalty can be a good idea in some circumstances, even if the current version in the U.S. doesn’t work so well). I’m slamming them for taking half-assed reasoning and presenting it as sophisticated. I’d say they don’t know better, they’re just kinda dumb—but you assure me that Vermeule is actually smart. So my take on it is that they’re really good at playing the academic game. For me to criticize their too-clever-by-half “law and economics” article as not being well thought through, that would be like criticizing LeBron James for not being a golf champion. They do what’s demanded of them in their job.

– Regarding Sunstein’s ability to learn from error: Yes, I mention in my post that Sunstein was persuaded by the article by Wolfers and Donohue. I do think it was good that Sunstein retracted his earlier stance. That’s one reason I was particularly disappointed by what he and his collaborator did in the second edition of Nudge, which was to memory-hole the Wansink episode. It was such a great opportunity in the revision, for them to have said that the nudge idea is so compelling that they (and many others) were fooled, and to consider the implications: in a world where people are rewarded for discovering apparently successful nudges, the Wansinks of the world will prosper, at least in the short term. Indeed, Sunstein and Thaler could’ve even put a positive spin on it by talking about the self-correcting nature of science, sunlight is the best disinfectant, etc. But, no, instead they remove it entirely, and then Sunstein returns to his previous credulous self by posting something on what he called the “coolest behavioral finding of 2019.” Earlier they’d referred to Wansink as having had multiple masterpieces. Kind of makes you question their judgment, no? My take on this is . . . for them, everyone’s a friend, so why rock the boat? As I wrote, it looks to me like an alliance of celebrities. I’m guessing that they are genuinely baffled by people like Uri Simonsohn or me who criticize this stuff: Don’t we have anything better to do? It’s natural to think of behavior of Simonsohn, me, and other “data thugs” as being kinda pathological: we are jealous, or haters, or glory-seekers, or we just have some compulsion to be mean (the kind of people who, in another life, would be Stasi).

– Regarding the Stasi quote: Yes, I agree it’s a good thing Sunstein retracted it. I was not thrilled that in the retraction he said he’d thought it had “a grain of truth,” but, yeah, as retractions go, it was much better than average! Much better than the person who called people “terrorists,” never retracted or apologized, then later published an article lying about a couple of us (a very annoying episode to me, which I have to kind of keep quiet about cos nobody likes a complainer, but grrrr it burns me up, that people can just lie in public like that and get away with it). So, yes, for sure, next time I write about this I will emphasize that he retracted the Stasi line.

– Libertarian paternalism: There’s too much on this for one email, but for my basic take, see this post, in particular the section “Several problems with science reporting, all in one place.” This captures it: Sunstein is all too willing to think that ordinary people are wrong, while trusting the testimony of Wansink, who appears to have been a serial fabricator. It’s part of a world in which normies are stupidly going about their lives doing stupid things, and thank goodness (or, maybe I should say in deference to Vermeule, thank God) there are leaders like Sunstein, Vermeule, and Wansink around to save us from ourselves, and also in the meantime go on NPR, pat each other on the back on Twitter, and enlist the U.S. government in their worthy schemes.

– People are complicated: Vermeule and Sunstein are not “good guys” or “bad guys”; they’re just people. People are complicated. What makes me sad about Sunstein is that, as you said, he does care about evidence, he can learn from error. But then he chooses not to. He chooses to stay in his celebrity comfort zone, making stupid arguments evaluating the president’s job performance based on the stock market, cheerleading biased studies about nudges as if they represent reality. See the last three paragraphs here. Another bad thing Sunstein did recently was to coauthor that Noise book. Another alliance of celebrities! (As a side note, I’m sad to see the collection of academic all-star endorsements that this book received.) Regarding Sunstein himself, see the section “A new continent?” of that post. As I wrote at the time, if you’re going to explore a new continent, it can help to have a local guide who can show you the territory.

Vermeule I know less about; my take is that he’s playing the politics game. He thinks that on balance the Republicans are better than the Democrats, and I’m guessing that when he promotes election fraud misinformation, that he just thinks he’s being mischievous and cute. After all, the Democrats promoted misinformation about police shootings or whatever, so why can’t he have his fun? And, in any case, election security is important, right? Etc etc etc. Anyone with a bit of debate-team experience can justify lots worse than Vermeule’s post-election tweets. I guess they’re not extreme enough for Sunstein to want to stop working with him.

– Other work by Vermeule and Sunstein: They’re well-respected academics, also you and others say how smart they are, so I can well believe they’ve also done high-quality work. It might be that their success in some subfields led them into a false belief that they know what they’re doing in other areas (such as psychology research, statistics, and election administration) where they have no expertise. As the saying goes, sometimes it’s important to know what you don’t know.

My larger concern, perhaps, is that these people get such deference in academia and the news media, that they start to believe their own hype and they think they’re experts in everything.

– Conspiracy theories: Sunstein and Vermeule wrote, “Many millions of people hold conspiracy theories; they believe that powerful people have worked together in order to withhold the truth about some important practice or some terrible event. A recent example is the belief, widespread in some parts of the world, that the attacks of 9/11 were carried out not by Al Qaeda, but by Israel or the United States.” My point here is that there are two conspiracy theories here: a false conspiracy theory that the attacks were carried out by Israel or the United States, and a true conspiracy theory that the attacks were carried out by Al Qaeda. In the meantime, Vermeule has lent his support to unsupported conspiracy theories regarding the 2020 election. So Vermeule is incoherent. On one hand, he’s saying that conspiracy theories are a bad thing. On the other hand, in one place he’s not recognizing the existence of true conspiracies; in another place he’s supporting ridiculous and dangerous conspiracy theories, I assume on the basis that they are in support of his political allies. I don’t think it’s a cheap shot to point out this incoherence.

And what the does it mean that Sunstein thinks that “Because those who hold conspiracy theories typically suffer from a ‘crippled epistemology,’ in accordance with which it is rational to hold such theories, the best response consists in cognitive infiltration of extremist groups.”—but he continues to work with Vermeule? Who would want to collaborate with someone who suffers from a crippled epistemology (whatever that means)? The whole thing is hard for me to interpret except as an elitist position where some people such as Sunstein and Vermeule are allowed to believe whatever they want, and hold government positions, while other people get “cognitively infiltrated.”

– The proposed government program: I see your point that when the government is infiltrating dangerous extremist groups, it could make sense for them to try to talk some of these people out of their extremism. After all, for reasons of public safety the FBI and local police are already doing lots of infiltration anyway—they hardly needed Sunstein and Vermeule’s encouragement. Overall I suspect it’s a good thing that the cops are gathering intelligence this way rather than just letting these groups make plans in secret, set off bombs, etc., and once the agents are on the inside, I’d rather have them counsel moderation than do that entrapment thing where they try to talk people into planning crimes so as to be able to get more arrests.

I think what bothers me about the Sunstein and Vermeule article—beyond that they’re worried about conspiracy theories while themselves promoting various con artists and manipulators—is in their assumption that the government is on the side of the good. Perhaps this is related to Sunstein being pals with Kissinger. I labeled Sunstein and Vermeuele as libertarian paternalists, but maybe Vermuele is better described as an authoritarian; in any case they seem to have the presumption that the government is on their side, whether it’s for nudging people to do good things (not to do bad things) or for defusing conspiracy theories (not to support conspiracy theories).

But governments can’t always be trusted. When I wrote, “They don’t even seem to consider a third option, which is the government actively promoting conspiracy theories,” it’s not that I was saying that this third option was a good thing! Rather, I was saying that the third option is something that’s actually done, and I gave examples of the U.S. executive branch and much of Congress in the period Nov 2020 – Jan 2021, and the Russian government in their invasion of Ukraine.” And it seems that Vermeule may well be cool with both these things! So my reaction to Vermeule saying the government should be engaging in information warfare is similar to my reaction when the government proposed to start a terrorism-futures program and have it be run by an actual terrorist: it might be a good idea in theory and even in practice, but (a) these are not the guys I would want in charge of such a program, and (b) their enthusiasm for it makes me suspicious.

– Unrelated to all the above: You say of Vermeule, “after his conversion to Catholicism, he adopted the Church’s line on moral opposition to capital punishment.” That’s funny because I thought the Catholic church was cool with the death penalty—they did the inquisition, right?? Don’t tell me they’ve flip-flopped! Once they start giving into the liberals on the death-penalty issue, all hell will break loose.

OK, why did I write all that?

1. The mix of social science, statistical evidence, and politics is interesting and important.

2. As an academic, I’m always interested in academics behaving badly, especially when it involves statistics or social science in some way. In particular, the idea that these guys are supposed to be so smart and so nice in regular life, and then they go with these not-so-smart, not-so-nice theories, that’s interesting. When mean, dumb people promote mean, dumb ideas, that’s not so interesting. But when nice, smart people do it . . .

3. It’s been unfair to Sunstein for me to keep bringing up that Stasi thing.

Regarding item 2, one analogy I can see with Vermeule endorsing stupid and dangerous election-fraud claims is dudes in the 60s wearing Che T-shirts and thinking Chairman Mao was cool. From one perspective, Che was one screwed-up dude and Mao was one of history’s greatest monsters . . . but both of them were bad-ass dudes and it was cool to give the finger to the Man. Similarly, Vermeule could well think of Trump as badass, and he probably thinks its hilarious to endorse B.S. claims that support his politics. Kinda like how Steven Levitt probably thinks he’s a charming mischievous imp by supporting climate denialists. Levitt would not personally want his (hypothetical) beach house on Fiji to be flooded, but, for a certain kind of person, it’s fun to be a rogue.

Here’s what I wrote when the topic came up before:

There’s no evidence that Vermeule was trying to overthrow the election. He was merely supportive of these efforts, not doing it himself, in the same way that an academic Marxist might root for the general strike and the soviet takeover of government but not be doing anything active on the revolution’s behalf.

Resources for teaching and learning survey sampling, from Scott Keeter at Pew Research

Art Owen informed me that he’ll be teaching sampling again at Stanford, and he was wondering about ideas for students gathering their own data.

I replied that I like the idea of sampling from databases, biological sampling, etc. You can point out to students that a “blood sample” is indeed a sample!

Art replied:

Your blood example reminds me that there is a whole field (now very old) on bulk sampling. People sample from production runs, from cotton samples, from coal samples and so on. Widgets might get sampled from the beginning, middle and end of the run. David Cox wrote some papers on sampling to find the quality of cotton as measured by fiber length. The process is to draw a blue line across the sample and see the length of fibers that intersect the line. This gives you a length-biased sample that you can nicely de-bias. There’s also an interesting out there about tree sampling, literally on a tree, where branches get sampled at random and fruit is counted. I’m not sure if it’s practical.

Last time I found an interesting example where people would sample ocean tracts to see if there was a whale. If they saw one, they would then sample more intensely in the neighboring tracts. Then the trick was to correct for the bias that brings. It’s in the Sampling book by S. K. Thompson. There are also good mark-recapture examples for wildlife.

I hesitate to put a lot of regression in a sampling class; It is all too easy for every class to start looking like a regression/prediction/machine learning class. We need room for the ideas about where and how data arises and it’s too easy to crowd those out by dwelling on the modeling ideas.

I’ll probably toss in some space-filling sampling plans and other ways to down size data sets as well.

The old Cochran style was: get an estimator, show it is unbiased, find an expression for its variance, find an estimate of that variance, show this estimate is unbiased and maybe even find and compare variances of several competing variance estimates. I get why he did it but it can get dry. I include some of that but I don’t let it dominate the course. Choices you can make and their costs are more interesting.

I connected Art to Scott Keeter at Pew Research, who wrote:

Fortunately, we are pretty diligent about keeping track of what we do and writing it up. The examples below have lengthy methodology sections and often there is companion material (such as blog posts or videos) about the methodological issues.

We do not have a single overview methodological piece about this kind of work but the next best thing is a great lecture that Courtney Kennedy gave at the University of Michigan last year, walking through several of our studies and the considerations that went into each one:

Here are some links to good examples, with links to the methods sections or extra features:

Our recent study of Jewish Americans, the second one we’ve done. We switched modes for this study (thus different sampling strategy), and the report materials include an analysis of mode differences https://www.pewresearch.org/religion/2021/05/11/jewish-americans-in-2020/

Appendix A: Survey methodology

Jewish Americans in 2020: Answers to frequently asked questions

Our most recent survey of the US Muslim population:

U.S. Muslims Concerned About Their Place in Society, but Continue to Believe in the American Dream


A video on the methods:
https://www.pewresearch.org/fact-tank/2017/08/16/muslim-americans-methods/

This is one of the most ambitious international studies we’ve done:

Religion in India: Tolerance and Segregation


Here’s a short video on the sampling and methodology:
https://www.youtube.com/watch?v=wz_RJXA7RZM

We then had a quick email exchange:

Me: Thanks. Post should appear in Aug.

Scott: Thanks. We’ll probably be using sampling by spaceship and data collection with telepathy by then.

Me: And I’ll be charging the expenses to my NFT.

In a more serious vein, Art looked into Scott’s suggestions and followed up:

I [Art] looked at a few things at the Pew web-site. The quality of presentation is amazingly good. I like the discussions of how you identify who to reach out to. Also the discussion of how to pose the gender identity question is something that I think would interest students. I saw some of the forms and some of the data on response rates. I also found Courtney Kennedy’s video on non-probability polls. I might avoid religious questions for in-depth followup in class. Or at least, I would have to be careful in doing it, so nobody feels singled out.

Where could I find some technical documents about the American Trends Panel? I would be interested to teach about sample reweighting, e.g., raking and related methods, as it is done for real.

I’m wondering about getting survey data for a class. I might not be able to require them to get a Pew account and then agree to terms and conditions. Would it be reasonable to share a downsampled version of a Pew data set with a class? Something about attitudes to science would be interesting for students.

To which Scott replied:

Here is an overview I wrote about how the American Trends Panel operates and how it has changed over time in response to various challenges:

Growing and Improving Pew Research Center’s American Trends Panel

This relatively short piece provides some good detail about how the panel works:
https://www.pewresearch.org/fact-tank/2021/09/07/how-do-people-in-the-u-s-take-pew-research-center-surveys-anyway/

We use the panel to conduct lots of surveys, but most of them are one-off efforts. We do make an effort to track trends over time, but that’s usually the way we used to do it when we conducted independent sample phone surveys. However, we sometimes use the panel as a panel – tracking individual-level change over time. This piece explains one application of that approach:
https://www.pewresearch.org/fact-tank/2021/01/20/how-we-know-the-drop-in-trumps-approval-rating-in-january-reflected-a-real-shift-in-public-opinion/

When we moved from mostly phone surveys to mostly online surveys, we wanted to assess the impact of the change in mode of interview on many of our standard public opinion measures. This study was a randomized controlled experiment to try to isolate the impact of mode of interview:

From Telephone to the Web: The Challenge of Mode of Interview Effects in Public Opinion Polls

Survey panels have some real benefits but they come with a risk – that panelists change as a result of their participation in the panel and no longer fully resemble the naïve population. We tried to assess whether that is happening to our panelists:

Measuring the Risks of Panel Conditioning in Survey Research

We know that all survey samples have biases, so we weight to try to correct those biases. This particularly methodology statement is more detailed than is typical and gives you some extra insight into how our weighting operates. Unfortunately, we do not have a public document that breaks down every step in the weighting process:

Methodology

Most of our weighting parameters come from U.S. government surveys such as the American Community Survey and the Current Population Survey. But some parameters are not available on government surveys (e.g., religious affiliation) so we created our own higher quality survey to collect some of these for weighting:

How Pew Research Center Uses Its National Public Opinion Reference Survey (NPORS)

This one is not easy to find on our website but it’s a good place to find wonky methodological content, not just about surveys but about our big data projects as well:

Home


We used to publish these through Medium but decided to move them in-house.

By the way, my colleagues in the survey methods group have developed an R package for the weighting and analysis of survey data. This link is to the explainer for weighting data but that piece includes links to explainers about the basic analysis package:
https://www.pewresearch.org/decoded/2020/03/26/weighting-survey-data-with-the-pewmethods-r-package/

Lots here to look at!

It’s been awhile since I’ve taught a course on survey sampling. I used to teach such a course—it was called Design and Analysis of Sample Surveys—and I enjoyed it. But . . . in the class I’d always have to spend some time discussing basic statistics and regression modeling, and this always was the part of the class that students found the most interesting! So I eventually just started teaching statistics and regression modeling, which led to my Regression and Other Stories book. The course I’m now teaching out of that book is called Applied Regression and Causal Inference. I still think survey sampling is important; it was just hard to find an audience for the course.

“My view is that if I can show that a result was cooked and that doing it correctly does not yield the answer the authors claimed, then the result is discredited. . . . What I hear, instead, is the following . . .”

Economic historian Tim Guinnane writes:

I have a general question that I have not seen addressed on your blog. Often this question turns into a narrow question about retracting papers, but I think that short-circuits an important discussion.

Like many in economic history, I am increasingly worried that much research in recent years reflects p-hacking, misrepresentation of the history, useless data, and other issues. I realize that the technical/statistical issues differ from paper to paper.

What I see is something like the following. You can use this paper as a concrete example, but the problems are much more widespread. We document a series of bad research practices. The authors played games with controls to get the “right” answer for the variable of interest. (See Table 1 of the paper). In the text they misrepresent the definitions of variables used in regressions; we show that if you use the stated definition, their results disappear. They use the wrong degrees of freedom to compute error bounds (in this case, they had to program the bounds by hand, since stata automatically uses the right df). There are other and to our minds more serious problems involved in selectively dropping data, claiming sources do not exist, etc.

Step back from any particular problem. How should the profession think about claims such as ours? My view is that if I can show that a result was cooked and that doing it correctly does not yield the answer the authors claimed, then the result is discredited. The journals may not want to retract such work, but there should be support for publishing articles that point out such problems.

What I hear, instead, is the following. A paper estimates beta as .05 with a given SE. Even if we show that this is cooked—that is, that beta is a lot smaller or the SE a lot larger if you do not throw in extraneous regressors, or play games with variable definitions—then ours is not really a result. It is instead, I am told, incumbent on the critic to start with beta=.05 as the null, and show that doing things correctly rejects that null in favor of something less than .05 (it is characteristic of most of this work that there really is no economic theory, so the null is always “X does not matter” which boils down to “this beta is zero.” And very few even tell us whether the correct test is one- or two-sided).

This pushback strikes me as weaponizing the idea of frequentist hypothesis testing. To my mind, if I can show that beta=.05 comes from a cooked regression, then we need to start over. That estimate can be ignored; it is just one of many incorrect estimates one can generated by doing things inappropriately. It actually gives the unscrupulous an incentive to concoct more outlandish betas which are then harder to reject. More generally, it puts a strange burden of proof on critics. I have discussed this issue with some folks in natural sciences who find the pushback extremely difficult to understand. They note what I think is the truth: it encourages bad research behavior by suppressing papers that demonstrate that bad behavior.

It might be opportune to have a general discussion of these sorts of issues on your website. The Gino case raises something much simpler, I think. I fear that it will in some ways lower the bar: so long as someone is not actively making up their data (which I realize has not been proven, in case this email gets subpoenaed!) then we do not need to worry about cooking results.

My reply: You raise several issues that we’e discussed on occasion (for some links, see here):

1. The “Research Incumbency Rule”: Once an article is published in some approved venue, it is taken as truth. Criticisms which would absolutely derail a submission in pre-publication review can be brushed aside if they are presented after publication. This is what you call “the burden of proof on critics.”

2. Garden of forking paths.

3. Honesty and transparency are not enough. Work can be non-fraudulent but still be crap.

4. “Passive corruption” when people know there’s bad work but they don’t do anything about it.

5. A disturbingly casual attitude toward measurement; see here for an example: https://statmodeling.stat.columbia.edu/2023/10/05/no-this-paper-on-strip-clubs-and-sex-crimes-was-never-gonna-get-retracted-also-a-reminder-of-the-importance-of-data-quality-and-a-reflection-on-why-researchers-often-think-its-just-fine-to-publ/ Many economists and others seem to have been brainwashed into thinking that it’s ok to have bad measurement because attenuation bla bla . . . They’re wrong.

He responded: If you want an example of economists using stunningly bad data and making noises about attenuation, see here.

The paper in question has the straightforward title, “We Do Not Know the Population of Every Country in the World for the Past Two Thousand Years.”

Michael Wiebe has several new replications written up on his site.

Michael Wiebe writes:

I have several new replications written up on my site.

Moretti (2021) studies whether larger cities drive more innovation, but I find that the event study and instrumental variable results are due to coding errors. This means that the main OLS results should not be interpreted causally.

Atwood (2022) studies the long-term economic effects of the measles vaccine. I run an event study and find that the results are explained by trends, instead of a treatment effect of the vaccine.

I [Wiebe] am also launching a Patreon, so that I can work on replications full-time.

Interesting. We’ve discussed some of Wiebe’s investigations and questions in the past; see here, here, here, and here (on the topics of promotion in China, election forecasting, historical patents, and forking paths, respectively). So, good to hear that he’s still at it!

Postdoc at Washington State University on law-enforcement statistics

This looks potentially important:

The Center for Interdisciplinary Statistical Education and Research (CISER) at Washington State University (WSU) is excited to announce that it has an opening for a Post-Doctoral Research Associate (statistical scientist) supporting a new state-wide public data project focused on law enforcement. The successful candidate will be part of a team of researchers whose mission is to modernize public safety data collection through standardization, automation, and evaluation. The project will actively involve law enforcement agencies, state and local policymakers, researchers, and the public in data exploration and discovery. This effort will be accomplished in part by offering education and training opportunities fostering community-focused policing and collaborative learning sessions. The statistical scientist in this role will develop comprehensive educational materials, workshops, online courses, and training manuals designed to equip and empower law enforcement agencies, state and local policymakers, researchers, and the public with data and statistical literacy skills that will enable them to maximize the utility of the data project.

Data, education, and policy. Interesting.

Progress in 2023, Jessica Edition

Since Aki and Andrew are doing it… 

Published:

Unpublished/Preprints:

Performed:

If I had to choose a favorite (beyond the play, of course) it would be the rational agent benchmark paper, discussed here. But I also really like the causal quartets paper. The first aims to increase what we learn from experiments in empirical visualization and HCI through comparison to decision-theoretic benchmarks. The second aims to get people to think twice about what they’ve learned from an average treatment effect. Both have influenced what I’ve worked on since.

Pinker was right, I was wrong.

In an aside from an article from 2014, the psychologist Steven Pinker mocked professors for “wearing earth tones, driving Priuses, and having a foreign policy.”

As I wrote at the time, I didn’t know any professors who wore earth tones or drove Priuses, but I bristled at the “having foreign policy”: as citizens of the United States (or as citizens of any other country), professors have as much right to political views as anyone else, and indeed I think it’s good for people to be informed on foreign policy and to participate publicly in politics.

There was some further discussion in comments, and that’s where that stood for me until yesterday, when I opened the newspaper and encountered this op-ed by Ezekiel “If I live to age 75, just kill me” Emanuel, on how “something is deeply wrong at America’s universities.” Along with recommending that every college student be required to take two ethics courses—I have some skepticism on that one, as I’m not sure where they’ll find the people to teach these classes—he writes:

The timidity of many university leaders in condemning the Hamas massacre and antisemitism more generally offers the wrong example. Leaders need to lead.

And now I see Pinker’s implicit point.

Let me explain. I have no problem with university leaders, or faculty more general, or anyone else condemning the Hamas massacre and antisemitism more generally. While they’re at it, I’d be fine for them to condemn aggressive war, unequal political systems (back in the 1980s they called it “apartheid” and it was a big topic of campus protests), illegal dumping of toxic waste, government and corporate corruption, university hospitals that cover up sexual abuse by their doctors, all sorts of things. I’m not joking here: lots of bad things are going on in the world and I’m glad that people are fighting bad things, not just in their backyards but also globally. And there’s no reason that atrocities should crowd each other out. I don’t want to be in the position of saying that people shouldn’t speak up about atrocity A because, what about atrocity B over there? There’s a division of labor in politics as in many other things.

So, yeah, I’m fine with professors, or university leaders, or anyone else, “having a foreign policy” in the sense of expressing opinions, trying to influence policies, etc.

My problem with Emanuel’s statement is that he’s arguing that this sort of statement should be some sort of norm. He writes, “Leaders need to lead.” The trouble is, there are so many issues out there? How much “leading” does he demand that university leaders do? As a minimum, I guess this would include condemning the Hamas massacre and antisemitism more generally, Israeli settlements and anti-Arab racism more generally, and also every other newsworthy bit of violence around the world. I guess if there is enough demand for such statements, university leaders will make them, with the main criterion being a judgment of whether they will suffer more hassle from not making an official statement than from making one. But then there’s the question of where to stop, which atrocities to condemn. You wouldn’t want a university president to become the equivalent of an outrage-of-the-week blogger.

I would actually like university leaders to be more active in condemning the bad things done on their own campuses. None of these bad things reaches anything close to the level of a massacre or an invasion, but, still, they are cases where the leaders can actually make a difference.

So here’s where I agree with Pinker. Or, to be more precise, where I agree with his implicit point. The problem is not with academic leaders “having a foreign policy” in the sense of expressing opinions and seeking to achieve political change; it’s with expectations or demands such as Emanuel’s that academic leaders should have a foreign policy, that it’s part of the job, that it’s a matter of leadership. It just seems like one more demand that political activists are pushing. Again, I don’t object to university leaders taking a stand on this issue; what I reject is the idea that doing so is a necessary part of their job.

P.S. After writing the above post, I did some googling and found that Pinker indeed has maintained his position on this general point. He writes:

Should a university have a foreign policy? Was it given a mandate to tell a grateful nation what emotions to feel in response to a national event, or what the correct moral position is? At Harvard, many colleagues & I will urge the university administration to shut up and do their job of providing an impartial forum for profs & students to argue these issues . . .

I see his point. Again, my problem is not with university leaders, or business leaders, or church leaders, etc., expressing views, but rather with the idea that this is what they are supposed to do.

P.P.S. I still think I was right on this one, however.

Why isn’t Barack Obama out there giving political speeches?

I was thinking about the above question after seeing that the retired politician had posted a list of favorite songs, something he apparently does every year.

Obama is healthy, popular, and is OK being in the public eye. Presumably he has strong views about politics, and he’s a famously inspiring speaker (see for example here).

So why isn’t he traveling around the country giving speeches, firing up the base and reminding the moderates why they like him so much? I guess he’ll do some of this during the next election campaign, but why wait until then.

I talked with some people about this, and the consensus was that, campaigning aside, retired or out-of-office politicians give public speeches only very rarely. What exceptions are there? Donald Trump, of course, then before that Bill Clinton gave many speeches, but to private audiences. Churchill gave that Iron Curtain speech in Missouri in 1946. I recall reading somewhere that Herbert Hoover used to give speeches saying how the New Deal was socialistic and un-American . . . ok, here’s something:

In February of 1935, Hoover broke his public silence and began a national speaking tour . . . an average of one major speech every month until the Republican convention of 1936. He used the speeches to attack specific New Deal programs and took his speaking tour to every region of the nation. The campaign provoked a flurry of magazine and newspaper articles extolling the “New Hoover.” Tailoring his speeches for a nation-wide radio audience, he appeared to have finally understood the significance of the rhetorical presidency. . . .

Hoover’s understanding of the rhetorical presidency helped him as he crossed the nation. He prepared his speeches with the radio audience in mind, turning to a prearranged conclusion when his network speaking time was ending. He attacked New Deal programs each month, in a national speaking tour that logged 45,000 miles, covered 28 states, and crossed the continent at least fourteen times in a year and a half.

OK, so that didn’t work out for him.

Let’s go back to 1900 and look at the actions of former U.S. presidents:

William McKinley was assassinated, so no post-presidential career opportunities.

Theodore Roosevelt didn’t like the policies of his successor, William Howard Taft, and actually did go around giving public speeches saying so.

Taft followed up his presidency with work in academia and government. He doesn’t seem to have been the sort to give rabble-rousing speeches.

Woodrow Wilson suffered a stroke near the end of his presidency and was in no condition to give speeches. Warren Harding died in office. After leaving office, Calvin Coolidge gave some speeches during Hoover’s reelection bid, and then he died before Franklin Roosevelt began his presidency, so he had no opportunity to speak against the New Deal even if he wanted to.

Hoover we’ve already discussed. Roosevelt died in office. So the next president who could’ve speechified was Harry Truman, who lived for 20 years after his retirement. He wrote his memoirs and spend some bit of effort grabbing whatever money was available. It does not seem that he made a habit of public speaking on political topics.

Dwight Eisenhower lived nearly a decade after retirement. According to wikipedia, he made a few campaign speeches during that period, and that was about it.

John Kennedy was in no position for public speaking after November 22, 1963. Lyndon Johnson was unpopular enough that you wouldn’t expect he’d be giving speeches after he left office; same for his successors Nixon, Ford, and Carter.

By the time Reagan retired he was too far gone to exercise his fabled oratorial skills. George H. W. Bush was always more of an insider than a communicator, so no surprise that he did embark on any public speaking tours.

Bill Clinton we’ve already discussed. He campaigned for Hillary but didn’t otherwise give many public speeches.

Regarding his successor, wikipedia has this to say: “[George W.] Bush appeared on NBC’s The Tonight Show with Jay Leno on November 19, 2013, along with his wife Laura. When asked by Leno why he does not comment publicly about the Obama administration, Bush said, ‘I don’t think it’s good for the country to have a former president criticize his successor.’ Despite this statement, Bush vocally disagreed with Obama’s withdrawal of U.S. troops from Iraq in 2011, calling it a ‘strategic blunder.'” This seems par for the course for retired politicians, to offer strong opinions but not in the form of public speeches.

Obama and Trump we’ve discussed above.

What about other retired politicians? Congressmembers who want to stay in the game usually just stay in Congress forever, and when they retire, they’re really ready to step off the stage. Some former cabinet officers have remained in public view, for example Ramsey Clark, Henry Kissinger, and that riverboat gambler William Bennett, so it can happen, but mostly it seems that they fade away, either restricting themselves to print media or keeping their powder dry in the hope of future appointive office.

Summary

I still wonder why Obama gives a list of his favorite songs but doesn’t give public speeches. The man is famous for his speaking skills, and there are lots of issues on which he could speak with authority, ranging from abortion to immigration to health care to the federal budget to general economic and foreign policy. I’m not saying the former president necessarily has anything useful to add to the discourse right now, but public speaking isn’t just about engaging with ideas; it’s also about affecting the narrative. And Obama remains a celebrity—I think that if were to schedule some speeches, he’d get a good turnout and tons of media attention. He’s got enough money that he could afford to give these speeches for free—it’s not like he’s so busy! OK, let’s go to wikipedia . . . . it seems that he did a podcast with Bruce Springsteen, which is fine for maintaining his boomer cool, but not so relevant if his goal is political influence. And he “traveled to Australia as a part of his speaking tour of the country”? Huh?

So, why no U.S. tour, Barack? One answer is that, Trump aside, there’s not much of a history of public speaking by politicians who are out of office. It’s mostly just not done, maybe out of some general tradition of respect for alternation of power: when you’re in government, you get your shot to influence policy, and when you’re out, you stay out and you let the new team do their job. Which makes sense. But the rules seem to have changed.

The other thing is that this may be a relatively short-term opportunity. Or, to put it another way, I see diminishing returns from this sort of public speaking. Trump’s rallies got a lot of attention in large part because they were unusual. If Obama gets into the game and is followed up by others, so that celebrity-politician rallies are a regular event, then maybe nobody will care so much about them. For now, though, yeah, it just seems wack for Obama to be giving out music endorsements but not political speeches in front of adoring throngs.

Progress in 2023

Published:

Unpublished:

Enjoy.

The continuing challenge of poststratification when we don’t have full joint data on the population.

Torleif Halkjelsvik at the Norwegian Institute of Public Health writes:

Norway has very good register data (education/income/health/drugs/welfare/etc.) but it is difficult to obtain complete tables at the population level. It is however easy to get independent tables from different registries (e.g., age by gender by education as one data source and gender by age by welfare benefits as another). What if I first run a multilevel model to regularize predictions for a vast set of variables, but in the second step, instead of a full table, use a raking approach based on several independent post-stratification tables? Would that be a valid approach? And have you seen examples of this?

My reply: I think the right way to frame this is as a poststratification problem where you don’t have the full poststratification table, you only have some margins. The raking idea you propose could work, but to me it seems awkward in that it’s mixing different parts of the problem together. Instead I’d recommend first imputing a full poststrat table and then using this to do your poststratification. But then the question is how to do this. One approach is iterative proportional fitting (Deming and Stephan, 1940). I don’t know any clean examples of this sort of thing in the recent literature, but there might be something out there.

Halkjelsvik responded:

It is an interesting idea to impute a full poststrat table, but I wonder whether it is actually better than directly calculating weights using the proportions in the data itself. Cells that should be empty in the population (e.g., women, 80-90 years old, high education, sativa spray prescription) may not be empty in the imputed table when using iterative proportional fitting (IPF), and these “extreme” cells may have quite high or low predicted values. By using the data itself, such cells will be empty, and they will not “steal” any of the marginal proportions when using IPF. This is of course a problem in itself if the data is limited (if there are empty cells in the data that are not empty in the population).

Me: If you have information that certain cells are empty or nearly so, that’s information that you should include in the poststrat table. I think the IPF approach will be similar to the weighting; it is just more model-based. So if you think the IPF will give some wrong answers, that suggests you have additional information. I recommend you try to write down all the additional information you have and use all of it in constructing the poststratification table. This should allow you to do better than with any procedure that does not use this info.

Halkjelsvik:

After playing with a few scenarios (on a piece of paper, no simulation) I see that my suggested raking/weighting approach (which also would involve iterative proportional fitting) directly on the sample data is not a good idea in contexts where MRP is most relevant. That is, if the sample cell sizes are small and regularization matters, then the subgroups of interest (e.g. geographical regions) will likely have too little data on rare demographic combinations. The approach you suggested (full population table imputation based on margins) appears more reasonable, and the addition of “extra information” is obviously a good idea. But how about a hybrid: Instead of manually accounting for “extra information” (e.g., non-existing demographic combinations) this extra information can be derived directly from the proportions of the sample itself (across subgroups of interest) and can be used as “seed” values (i.e., before accounting for margins at the local level). Using information from the sample to create the initial (seed) values for the IPF may be a good way to avoid imputing positive values in cells that are structural zeros, given that the sample is sufficiently large to avoid too many “sample zeros” that are not true “structural zeros”.

So the following could be an approach for my problem?

1. Obtain regularized predictions from sample.

2. Produce full postrat seed table directly from “global” cell values in the sample (or from other available “global” data, e.g. if available only at national level). That is, regions start with identical seed structures.

3. Adjust the poststrat table by iterative proportional fitting based on local margins (but I have read that there may be convergence problems when there are many zeros in seed cells).

Me: I’m not sure! I really want to have a fully worked-out example, a case study of MRP where the population joint distribution (the poststratification table) is not known and it needs to be estimated from data. We’re always so sloppy in those settings. I’d like to do it with a full Bayesian model in Stan and then compare various approximations.

“Integrated Inferences: Causal Models for Qualitative and Mixed-Method Research”

Macartan Humphreys and Alan Jacobs write:

We are delighted to announce the publication of our new book, Integrated Inferences: Causal Models for Qualitative and Mixed-Method Research.

This book has been quite a few years in the making, but we are really happy with how it has turned out and hope you will find it useful for your research and your teaching.

Integrated Inferences provides an introduction to fundamental principles of causal inference and Bayesian updating and shows how these tools can be used to implement and justify inferences using within-case (process tracing) evidence, correlational patterns across many cases, or a mix of the two. The basic idea builds on work by Pearl, Gelman, and others. If we can represent theories graphically – as causal models – we can then update our beliefs about these models using Bayesian methods, and then draw inferences about populations or cases from different types of data. We also demonstrate how causal models can guide research design, informing choices about which cases, observations, and mixes of methods will be most useful for addressing any given question.

We’ve attached Chapter 1 in case you want to have a look. See also https://integrated-inferences.github.io/ for resources including a link to a full open access version of the book.

The book has an accompanying R package on CRAN: CausalQueries. The package can be used to make, update, and query causal models using quite simple syntax. We’d be delighted if you took it for a spin. There’s a draft guide to the package here (Tietz et al).

The only thing about the above note that confuses me is the “but” in “This book has been quite a few years in the making, but we are really happy with how it has turned out…” I assume that it took years because they were trying to make it just right, so it’s no surprise that that they’re happy with it, conditional on them finally being ready to release it. The “but” seems to imply some sort of discordance which I don’t see here.

Celebrity scientists and the McKinsey bagmen

Josh Marshall writes:

Trump doesn’t think of truth or lies the way you or I do. Most imperfect people, which is to say all of us, exist in a tension between what we believe is true and what is good for or pleasing to us. If we have strong character we hew closely to the former, both in what we say to others and what we say to ourselves. The key to understanding Trump is that it’s not that he hews toward the latter. It’s that the tension doesn’t exist. What he says is simply what works for him. Whether it’s true is irrelevant and I suspect isn’t even part of Trump’s internal dialog. It’s like asking an actor whether she really loved her husband like she claimed in her blockbuster movie or whether she was lying. It’s a nonsensical question. She was acting.

The analogy to the actor is a good one.

Regarding the general sort of attitude and behavior discussed here, though, I don’t think Trump stands out as much as Marshall implies. Even setting aside other politicians, who in the matter of lying often seem to differ from the former president more in degree than kind, I feel like I’ve seen the same sort of thing with researchers, which is one reason I think Clarke’s law (“Any sufficiently crappy research is indistinguishable from fraud”) is so often relevant.

When talking about researchers who don’t seem to care about saying the truth, I’m not just talking about various notorious flat-out data fakers. I’m also talking about researchers who just do unreplicable crap or who make claims in the titles and abstracts of their papers that aren’t supported by their data. We get lots of statements that are meaningless or flat-out false.

Does the truth matter to these people? I don’t know. I think they believe in some things they view as deeper truths: (a) their vague models of how the world works are correct, and (b) they are righteous eople. Once you start there, all the false statements don’t matter, as they are all being done in the service of a larger truth.

I don’t think everyone acts this way—I have the impression that most people, as Marshall puts it, “exist in a tension between what we believe is true and what is good for or pleasing to us.” There’s just a big chunk of people—including many academic researchers, journalists, politicians, etc.—who don’t seem to feel that tension. As I’ve sometimes put it, they choose what to say or what to write based on the music, not the words. And they see the rest of us as “schoolmarms” or “Stasi“—pedants who get in the way of the Great Men of science. Not the same as Donald Trump by a longshot, but I see some similarities in that it’s kinda hard to pin them down when it comes to factual beliefs. It’s much more about who’s-side-are-you-on.

Also incentives: it’s not so much that people lie because of incentives, as that incentives affect the tough calls they make, and incentives affect who succeeds on climbing the greasy pole of success.

Concerns about misconduct at Harvard’s department of government and elsewhere

The post below addresses a bunch of specifics about Harvard, but for the key point, jump to the last paragraph of the post.

Problems about Harvard

A colleague pointed me to this post by Christopher Brunet, “The Curious Case of Claudine Gay,” and asked what I thought. It was interesting. I’ve met or corresponded with almost all the people involved, at some time or another. Here’s my take:

Interesting. I know almost all the people involved, one way or another (sometimes just by email). Here’s my take:

– There’s a claim that Harvard political science professor Ryan Enos falsified a dataset. I looked at this awhile ago. I thought I’d blogged it but I couldn’t find it in a google search. There’s a pretty good summary here by journalist Jesse Singal here. I corresponded with both Singal and Brunet on this one. As I wrote, “I’d say that work followed standard operating procedure of that era which indeed was to draw overly strong conclusions from quantitative data using forking paths.” I don’t think it’s appropriate to say that someone falsified data, just because they did an analysis that (a) had data issues and (b) came to a conclusion that doesn’t make you happy. Data issues come up all the time.

– There’s a claim that Gay “swept this [Enos investigation] under the rug” (see here). This reminds me of my complaint about the University of California not taking seriously the concerns about the publications of Matthew “Why We Sleep” Walker (see here). A common thread is that universities don’t like to discipline their tenured professors! Also, though, I wasn’t convinced by the claim that Enos committed research misconduct. The Walker case seems more clear to me. But, even with the Walker case, it’s possible to come up with excuses.

– There’s a criticism that Gay’s research record is thin. That could be. I haven’t read her papers carefully. I guess that a lot of academic administrators are known more for their administration than their research records. Brunet writes, “A prerequisite for being a Dean at Harvard is having a track record of research excellence.” I guess that’s the case sometimes, maybe not other times. Lee Bollinger did a lot as president of University of Michigan and then Columbia, but I don’t think he’s known for his research. He published some law review articles once upon a time? Brunet refers to Gay being an “affirmative action case,” but that seems kind of irrelevant given that that lots of white people reach academic heights without doing influential research.

– There’s a criticism of a 2011 paper by Dustin Tingley, which has the line, “Standard errors clustered at individual level and confidence intervals calculated using a parametric bootstrap running for 1000 iterations,” but Brunet says, “when you actually download his R code, there is no bootstrapping.” I guess, maybe? I clicked through and found the R code here, but I don’t know how the “zelig” package works. Brunet writes that Tingley “grossly misrepresented the research processes by claiming his reported SEs are bootstrap estimates clustered at the individual level. As per the Zelig documentation, no such bootstrapping functionality ever existed in his chosen probit regression package.” I googled and it seemed that zelig did have boostrapping, but maybe not with clustering. I have no idea what’s going on here: it could be a misunderstanding of the software on Brunet’s part, a misunderstanding on Tingley’s part, or some statistical subtlety. I’m not really into this whole clustered standard errors thing anyway. My guess is that there was some confusion regarding what is a “bootstrap,” and it makes sense that a journalist coming at this from the outside might miss some things. The statistical analysis in this 2011 paper can be questioned, as is usually the case with anyone’s statistical analysis when they’re working on an applied research frontier. For example, from p.12 of Tingley’s paper: “Looking at the second repetition of the experiment, after which subjects had some experience with the strategic context, there was a significant difference in rejection rates across the treatments in the direction predicted by the model (51% when delta = 0.3 and 63% when delta = 0.7) (N = 396, t = 1.37, p = .08). Pooling offers of all sizes together I find reasonable support for Hypothesis 1 that a higher proportion of offers will be rejected, leading to both players paying a cost, when the shadow of the future was higher.” I’m not a fan of this sort of statistical-significance-based claim, including labeling p = .08 as “reasonable support” for a hypothesis, but this is business as usual in the social sciences.

– There’s a bunch of things about internal Harvard politics. I have zero knowledge one way or another regarding internal Harvard politics. What Brunet is saying there could be true, or maybe not, as he’s relying on various anonymous sources and other people with axes to grind. For example, he writes, “Gay did not recuse herself from investigating Enos. Rather, she used the opportunity to aggressively cover up his research misconduct.” I have no idea what standard policy is here. If she had recused herself, maybe she could be criticized for avoiding the topic. For another example, Brunet writes, “Claudine Gay allowed Michael Smith to get away scot-free in the Harvard-Epstein ties investigation — she came in and nicely whitewashed it all away. Claudine Gay has Epstein coverup stink on her, and Michael Smith has major Epstein stink on him,” and this could be a real issue, or it could just be a bunch of associations, as he doesn’t actually quote from the Harvard-Epstein ties investigation to which he refers. Regarding Jorge Dominguez: as Brunet says, the guy had been around for decades—indeed, I heard about his sexual harassment scandal back when I was a Ph.D. student at Harvard, I think it was in the student newspaper at the time, and I also remember being stunned, not so much that it happened, but that the political science faculty at the time just didn’t seem to care—so it’s kind of weird how first Brunet (rightly) criticizes the Government “department culture” that allowed a harasser to stay around for so long, and then he criticizes Smith for “protecting Dominguez” and criticizes Gay for being “partly responsible for having done nothing to address Dominguez’s abuses”—but then he also characterizes Smith as having “decided to throw [Dominguez] under the bus.” You can’t have it both ways! Responding to a decades-long harassment campaign is not “throwing someone under the bus.” Regarding Roland Fryer, Brunet quotes various politically-motivated people complimenting Fryer, which is fine—they guy did some influential research—but no context is added by referring to Fryer as “a mortal threat to some of the most powerful black people at Harvard” and referring to Gay as “a silky-smooth corporate operator.” Similarly, the Harvey Weinstein thing is something that can go both ways: if Gay criticizes a law professor who chooses to defend Weinstein, then she’s “was driven by pure spite. She is a petty and vicious little woman.” If she had supported the prof, I can see the argument the other way: so much corruption, she turns a blind eye to Epstein and then to Weinstein, why is she attacking Fryer but defending the law professor who is defending the “scumbag,” etc.

It’s everywhere

Here’s my summary. I think if you look carefully at just about any university social-science departments, you’ll be likely to find some questionable work, some faculty who do very little research, and some administrators who specialize in administration rather than research, as well as lots and lots of empirical papers with data challenges and less than state-of-the-art statistical analyses. You also might well find some connections to funders who made their money in criminal enterprises, business and law professors who work for bad guys, and long-tolerated sexual harassers. I also expect you can find all of this in private industry and government; we just might not hear about it. Universities have a standard of openness that allows us to see the problems, in part because universities have lots of graduates who can spill the beans without fear of repercussions. Also, universities produce public documents. For example, the aforementioned Matthew Walker wrote Why We Sleep. The evidence of his research misconduct is right out there. In a government or corporate context, the bad stuff can be inside of internal documents.

Executive but no legislative and no judicial

There’s also the problem that universities, and corporations, have an executive branch but no serious legislative or judicial branches. I’ve seen a lot of cases of malfeasance within universities where nothing is done, or where whatever is done is too little, too late. I attribute much of this problem to the lack of legislative and judicial functions. Stepping back, we could think of this as a problem with pure utilitarianism. In a structural system of government, each institution plays a role. The role of the judicial system is to judge without concern about policy consequences. In the university (or a corporation), there is on the executive, and it’s hard for the executive to make a decision without thinking about consequences. Executives will accept malfeasance of all sorts because they decide that the cost of addressing the malfeasance is greater than the expected benefit. I’m not even just talking here about research misconduct, sexual harassment, or illegal activities by donors; other issues that arise range from misappropriation of grant money, violation of internal procedures, and corruption in the facilities department.

To get back to research for a moment, there’s also the incentive structure that favors publication. Many years ago I had a colleague who showed me a paper he’d written that was accepted for publication in a top journal. I took a look and realized it had a fatal error–not a mathematical error, exactly, more of a conceptual error so that his method wasn’t doing what he was claiming it was doing. I pointed it out to him and said something like, “Hey, you just dodged a bullet–you almost published a paper that was wrong.” I assumed he’d contact the journal and withdraw the article. But, no, he just let the publication process go as scheduled: it gave him another paper on his C.V. And, back then, C.V.’s were a lot shorter; one publication could make a real difference! That’s just one story; the point is that, yes, of course a lot of fatally flawed work is out there.

So, yeah, pull up any institutional rock and you’re likely to find some worms crawling underneath. It’s good for people to pull up rocks! So, fair enough for Brunet to write these posts. And it’s good to have lots of people looking into these things, from all directions. The things that I don’t buy are his claims that there is clear research misconduct by Enos and Tingley, and his attempt to tie all these things together to Gay or to Harvard more generally. There’s a paper from 2014 with some data problems, a paper from 2011 by a different professor from the same (large) department that used some software that does bootstrapping, a professor in a completely different department who got donations from a criminal, a political science professor and an economics professor with sexual harassment allegations, a law professor who was defending a rich, well-connected rapist . . . and Brunet is criticizing Gay for being too lenient in some of these cases and too strict in others. Take anyone who’s an administrator at a large institution and you’ll probably find a lot of judgment calls.

Lots of dots

To put it another way, it’s fine to pick out a paper published in 2014 with data problems and a paper published in 2011 with methods that are not described in full detail. Without much effort it should be possible to find hundreds of examples from Harvard alone that are worse. Much worse. Here are just a few of the more notorious examples:

Stereotype susceptibility: Identity salience and shifts in quantitative performance,” by Shih, Pittinsky, and Ambady (1999)

This Old Stereotype: The Pervasiveness and Persistence of the Elderly Stereotype,” by Cuddy, Norton, and Fiske (2005)

Rule learning by cotton-top tamarins,” by Hauser, Weiss, and Marcus (2006)

Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end,” by Shu, Mazar, Gino, Ariely, and Bazerman (2012)

“Jesus said to them, ‘My wife…'”: A New Coptic Papyrus Fragment,” by King (2014)

Physical and situational inequality on airplanes predicts air rage,” by DeCelles and Norton (2016).

the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” (not actually in a published paper, but in a press release featuring two Harvard processors from 2016)

Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis,” by Mehra, Ruschitzka, and Patel (2020).

It’s not Brunet’s job, or mine, or anyone’s, to look at all these examples, and it’s fine for Brunet to focus on two much more ambiguous cases of problematic research papers. The point of my examples above is to put some of his connect-the-dots exercises into perspective. Harvard, and any other top university, will have hundreds of “dots”—bad papers, scandals, harassment, misconduct, etc.—that can be connected in many different ways.

A problem of quality control

We can see this as a problem of quality control. A large university is going to have some rate of iffy research, sexual harassment, tainted donations (and see here for a pointer to a horrible Harvard defense of that), faculty who work for bad people, etc., and it’s not really set up to handle this. Indeed, a top university such as Harvard or USC could well be more likely to have such problems: Its faculty are more successful, so even their weak work could get publicity, their faculty are superstars so might be more likely to get away with sexual harassment (but it seems that even the non-tenure-track faculty at such places can be protected by the old boys’ network), top universities could be more likely to get big donations from rich criminals, and they could also have well-connected business and law professors who’d like to make money defending bad guys (back at the University of California we had a professor who was working for the O. J. Simpson defense team!). I’ve heard a rumor that top universities can even cheat on their college rankings. And, again, universities have no serious legislative or judicial institutions, so the administrators at any top university will find themselves dealing with an unending stream of complaints regarding research misconduct, sexual harassment, tainted donations, and questionable outside activities by faculty, not to mention everyday graft of various sorts. I’m pretty sure all this is happening in companies too; we just don’t usually hear so much about it. Regarding the case of Harvard’s political science department, I appreciate Brunet’s efforts to bring attention to various issues, even if I am not convinced by several of his detailed claims and am not at all convinced by his attempt to paint this all as a big picture involving Gay.

In July 2015 I was spectacularly wrong

See here.

Also interesting was this question that I just shrugged aside:

If a candidate succeeds in winning a nomination and goes on to win the election and reside in the White House do they have to give up their business interests as these would be seen as a conflict of interest? Can a US president serve in office and still have massive commercial business interests abroad?