Leap Day Special!

The above graph is from a few years ago but is particularly relevant today!

It’s funny that, in leap years, approximately 10% fewer babies are born on 29 Feb. I think it would be cool to have a Leap Day birthday. But I guess most people, not being nerds, would prefer the less-“weird” days before and after.

There’s lots of good stuff at the above link; I encourage you to read the whole thing.

In the years since, we’ve improved Stan so we can fit and improve the birthdays time series decomposition model using full Bayesian inference.

Here’s Aki’s birthday case study which has all the details. This will also be going into our Bayesian Workflow book.

“Exclusive: Embattled dean accused of plagiarism in NSF report” (yup, it’s the torment executioners)

The story is at Retraction Watch:

Erick Jones, the dean of engineering at the University of Nevada in Reno, appears to have engaged in extensive plagiarism in the final report he submitted to the National Science Foundation for a grant, Retraction Watch has learned.

The $28,238 grant partially supported a three-day workshop that Jones and his wife, Felicia Jefferson, held for 21 students in Washington, DC, in April 2022 titled “Broadening Participation in Engineering through Improved Financial Literacy.” Jefferson received a separate award for $21,757.

Jones submitted his final report to the agency in May 2023. Retraction Watch obtained a copy of that report through a public records request to Jones’s previous employer, the University of Texas at Arlington, and identified three published sources of extended passages he used without citation or quotation marks. . . .

Lots more details at the link.

Those torment executioners keep on tormenting us.

In all seriousness, between the University of Nevada salary and the National Science Foundation grants, this guy’s been taking a lot of public funds to produce some really bad work. Seems like a real failure of oversight at UNR and NSF to let this go on like this.

Good work by Retraction Watch to follow up on this story.

P.S. I forgot to include the quotations from UNR luminaries:

“In Erick Jones, our University has a dynamic leader who understands how to seize moments of opportunity in order to further an agenda of excellence,” University President Brian Sandoval said.

“What is exciting about having Jones as our new dean for the College of Engineering is how he clearly understands the current landscape for what it means to be a Carnegie R1 ‘Very High Research’ institution,” Provost Jeff Thompson said. “He very clearly understands how we can amplify every aspect of our College of Engineering, so that we can continue to build transcendent programs for engineering education and research.”

Also this:

Jones was on a three-year rotating detail at National Science Foundation where he was a Program Director in the Engineering Directorate for Engineering Research Centers Program.

Shameful that he would work for NSF and then pay that back by taking their money and submitting a plagiarized report. But, hey, I guess that’s what University President Brian Sandoval would call “understanding how to seize moments of opportunity in order to further an agenda of excellence.”

What could be more excellent than taking government funds and using it to publish plagiarized reports and crayon drawings?

It sounds like it’s fine with UNR if their dean of engineering does this. I wonder what would happen to any UNR students who did this sort of thing? I guess they wouldn’t get paid $372,127 for it, but maybe the university could at least give them a discount on their tuition?

P.P.S. That all said, let’s not forget that there are much worse cases of corruption out there. The UNR case just particularly bothers me, partly because it’s close to what I do—except that when my colleagues get NSF funds, we don’t use them to produce plagiarized reports—and partly because the problems are so obvious: as discussed in our earlier post, you can look at the papers this dean of engineering had published and see that they are incoherent and have no content, even before getting into the plagiarism. It’s hard to believe that his hiring was a mere lack of oversight; you’d have to work really hard to not see the problems in his publications. But, yeah, there’s lots of much worse stuff going on that we read about in the newspaper every day.

Varying slopes and intercepts in Stan: still painful in 2024

Andrew recently blogged the following: Tutorial on varying-intercept, varying-slope multilevel models in Stan, from Will Hipson. This is the kind of model Andrew et al. used for one example in Red State, Blue State, which is the varying effect of income on Republican preference by state. Each state had its own slope and intercept related with a multivariate hierarchical prior. The version in Gelman and Hill’s regression book is a hack that tried to scale an inverse Wishart; the LKJ is what they would have used if Ben Goodrich had created it at that point.

Andrew points to a tutorial on Bayesian varying effects models from Will Hipson, which is really nice in the way it steps through workflow, building up the model in stages. The model Hipson develops is an improvement on what we have in our User’s Guide. After everything else, I circle back and talk about doc, trying to connect it to my recent post on why doc is so dangerous.

I think we can do a bit better in the current verison of Stan, but I have to confess up front that Andrew’s right—this is still painful. This took me around three hours to put together the model and simulations and blog post and I’m the one who designed the language! This would’ve been much faster if I wasn’t trying to bring it up to a “publishable” standard as an example of how I like to see Stan code written.

The original Stan model

Here’s Will Hipson’s model:

data {
  int N_obs; // number of observations
  int N_pts; // number of participants
  int K; // number of predictors + intercept
  int pid[N_obs]; // participant id vector
  matrix[N_obs, K] x; // matrix of predictors
  real y[N_obs]; // y vector
}

parameters {
  matrix[K, N_pts] z_p; // matrix of intercepts and slope
  vector[K] sigma_p; // sd for intercept and slope
  vector[K] beta; // intercept and slope hyper-priors
  cholesky_factor_corr[K] L_p; // Cholesky correlation matrix
  real sigma; // population sigma
}

transformed parameters {
  matrix[K, N_pts] z; // non-centered version of beta_p
  z = diag_pre_multiply(sigma_p, L_p) * z_p; 
}

model {
  vector[N_obs] mu;
  
  // priors
  beta ~ normal(0, 1);
  sigma ~ exponential(1);
  sigma_p ~ exponential(1);
  L_p ~ lkj_corr_cholesky(2);
  to_vector(z_p) ~ normal(0, 1);
  
  // likelihood
  for(i in 1:N_obs) {
    mu[i] = beta[1] + z[1, pid[i]] + (beta[2] + z[2, pid[i]]) * x[i, 2];
  }
  y ~ normal(mu, sigma);
}

generated quantities {
  matrix[2, 2] Omega;
  Omega = multiply_lower_tri_self_transpose(L_p);
}

Warning: There’s a bug in this code in that it only handles the K = 2 case. You can see this with the 1 and 2 hardcoded in the definition of mu[i].

My Stan model

The documentation for the model is at the top of the Stan code, then the Stan code only has a single line of doc other than explanations of the variables (which I wouldn’t include in non-tutorial code, just to link this back to what I was saying a few posts ago about comments).

/**
 * Varying slopes and intercept hierarchical linear regression.
 * N observations organized into J groups, with jj[n] being the group
 * and x[n, 1:K] the covariates for observation n.  The covariate
 * matrix x should include a column of 1s to include a slope.
 * 
 * The slopes and intercept per group have a multivariate normal prior
 * and the scale has an exponential prior.  The location of the
 * multivariate normal prior has a standard normal hyperprior and its
 * covariance is decomposed into a correlation matrix with an LKJ
 * hyperprior and a scale vector with an exponential hyperprior. In
 * symbols: 
 *
 * Likelihod:
 *   y[n] ~ normal(x[n] * beta[1:K, jj[n]], sigma) for n in 1:N
 *
 * Priors:
 *   sigma ~ exponential(1)
 *   beta[1:K, j] ~ multi_normal(nu, Sigma) for j in 1:J
 * 
 * Hyperpriors:
 *   nu ~ normal(0, 1)
 *   scale(Sigma) ~ exponential(1)
 *   corr(Sigma) ~ lkj(2)
 *
 * where scale(Sigma) is the scale vector and corr(Sigma) is the
 * correlation matrix of Sigma.
 *
 * For efficiency and numerical stability, the covariance and
 * correlation matrices are Cholesky factored.
 */
data {
  int<lower=0> J;                      // number of groups
  int<lower=0> N;                      // number of observations
  array[N] int<lower=1, upper=J> jj;   // group per observation
  int<lower=1> K;                      // number of covariates
  matrix[N, K] x;                      // data matrix
  vector[N] y;                         // observations
}
parameters {
  vector[K] nu;                        // location of beta[ , j]
  vector<lower=0>[K] tau;              // scale of beta[ , j]
  cholesky_factor_corr[K] L_Omega;     // Cholesky of correlation of beta[ , j]
  matrix[K, J] beta_std;               // standard beta (beta - nu) / Sigma
  real<lower=0> sigma;                 // observation error for y
}
transformed parameters {
  matrix[K, J] beta = rep_matrix(nu, J)
                      + diag_pre_multiply(tau, L_Omega) * beta_std;
}
model {
  nu ~ normal(0, 1);
  tau ~ exponential(1);
  L_Omega ~ lkj_corr_cholesky(2);
  to_vector(beta_std) ~ normal(0, 1);  // beta[ , j] ~ multi_normal(nu, Sigma)
  sigma ~ exponential(1);
  y ~ normal(rows_dot_product(x, beta[ , jj]'), sigma);
}
generated quantities {
  matrix[K, K] Sigma                   // covariance of beta[, j]
    = multiply_lower_tri_self_transpose(diag_pre_multiply(tau, L_Omega));
}

(WordPress is really annoying in its mishandling of angle brackets in pre environments.)

I started the first version using K = 2 and loops rather than vectorization. Next, I generalized from K = 2 to general K. Then I found the rows dot product function and got rid of the last loop. (Loops are fast in Stan—it’s the redundant autodiff, like multiple calculations of log(sigma) that are the time wasters in non-vectorized code.)

One could make the case for suffixing all the parameters of the prior for beta using _beta. You’d probably want to do that in a context with multiple groupings. It gets very hard to read even in this simple case—with multiple groupings it’s a right royal mess. Long variable names are very hard to read in math.

I put all the comments to the right at end of lines, so they don’t waste any vertical space and don’t get in the way of actually reading the code. Except the one true inline code comment that I’d leave, which is the implication of all the non-centered parameterization on the distribution of beta.

Let’s take it for a spin

I actually wrote the model first. I would generally recommend debugging the stages, but I’ve built this model many times, so I decided to roll the dice.

What I haven’t done before is simulate data in Python. Luckily, I have a very helpful coding assistant named ChatGPT. I have a system prompt that looks like this (I’m not actually a professor any more, but I thought it’d be easier to underestand that way).

WHO: I am a professor in statistics and machine learning with a Ph.D. in cognitive science and computer science. I understand (a) graduate level probability and statistics, (b) graduate-level machine learning including neural networks, (c) applied mathematics including calculus and analysis, matrices and linear algebra, combinatorics and graph theory, mathematical logic and set theory, (c) cognitive science including psycholinguistics and cognitive anthropology, (d) linguistics including phonology, morphology, syntax, semantics, and pragmatics, and (e) philosophy including 20th century analytic philosophy and philosophy of mind.

HOW: Respond in a concise and neutral academic style without embellishments.

Please call me “Bob”.

This was a one-and-done prompt. I’ve also had a bit of practice formulating these queries.

BOB: In Python, I want to set K = 5 covariates J = 10 groups. I want to generate a random K-vector location nu from a standard normal, and a (K x K) random covariance matrix Sigma of roughly unit scale (multiplying Cholesky factors or taking entries from a random walk would be OK). Then generate a K x J matrix beta where the columns are drawn from a multivariate normal distribution with location nu and covariance Sigma.

Then I want to set N = 1000 observations. For each observation in in range(N), generate a random group jj[n] in range(J) uniformly. Take a scale sigma = 1.5 for errors, and generate an N x K standard normal matrix x of covariates, with the first column set to all 1 values for an intercept. Then generate y from a hierarchical linear regression where the coefficient vector for observation n is given by beta[:, jj[n]] and the noise is normal with location 0 and scale sigma.

GPT-4 (Feb 28, 2024): [It just led with the code, which is unusual.]

import numpy as np
import scipy.linalg

# Set parameters
K = 5  # number of covariates
J = 10  # number of groups
N = 1000  # number of observations
sigma = 1.5  # scale of errors

# Generate nu and Sigma
nu = np.random.normal(0, 1, K)  # K-vector location from a standard normal
A = np.random.normal(0, 1, (K, K))
Sigma = np.dot(A, A.T)  # create a positive semi-definite matrix for covariance

# Generate beta
beta = np.random.multivariate_normal(nu, Sigma, J).T  # K x J matrix

# Generate observations
jj = np.random.randint(0, J, N)  # random group for each observation
x = np.random.normal(0, 1, (N, K))  # N x K standard normal matrix
x[:, 0] = 1  # set first column to all 1s for intercept

# Generate y
y = np.zeros(N)
for n in range(N):
    y[n] = np.dot(x[n], beta[:, jj[n]]) + np.random.normal(0, sigma)

y[:5]  # Show the first 5 values of y as a sample output

The hierarchical linear regression model was successfully generated with the specified parameters. The first five values of the dependent variable y from the model are approximately:

−2.27
8.86
8.20
1.30
7.27

These values are based on the simulation with K=5 covariates,
J=10 groups, N=1000 observations, and an error scale σ=1.5.

It looks like it actually ran that Python code? I haven’t seen output in this format before from GPT—the code’s in a pulldown labeled “Analyzing”.

Anway, GPT’s code worked out of the box. It only took a couple of minutes to pull the data generation into a function and return a dictionary with simulations and simulated parameter values to compare to the fitted values. My Python code illustrates my suggestion a few posts back to turn blocks of code into functions rather than putting a bunch of comments around them in the script.

import numpy as np
import scipy.linalg

def sim_param_data(K, J, N, sigma):
    nu = np.random.normal(0, 1, K)
    A = np.random.normal(0, 1, (K, K))
    Sigma = np.dot(A, A.T)
    beta = np.random.multivariate_normal(nu, Sigma, J).T 
    jj = np.random.randint(0, J, N)
    x = np.random.normal(0, 1, (N, K))
    x[:, 0] = 1
    y = np.zeros(N)
    for n in range(N):
        y[n] = np.dot(x[n], beta[:, jj[n]]) + np.random.normal(0, sigma)
    return nu, Sigma, beta, {'J': J, 'N': N, 'jj': jj + 1, 'K': K, 'x': x, 'y': y}
    
import cmdstanpy as csp
nu, Sigma, beta, data = sim_param_data(K = 5, J = 10, N = 1000, sigma = 1.5)
model = csp.CmdStanModel(stan_file = 'rsbs.stan')
fit = model.sample(data)

It takes Stan about 20 seconds to fit this data set, R-hats all less than 1.01, is low, ESS in the thousands from a sample of size 4000, and all but a couple parameters are all recovered within 95% posterior intervals. There is quite a lot of uncertainty here with this little data and this many groups—don’t take those point estimates of covariance too seriously!

Appendix on doc

Let’s digress and talk about doc. I wrote a blog post a few days ago on doc, and this triggers some of the same issues. I want to say up front that doc is hard and if you go and look at code I’ve written, there will be a lot of places where you can improve the doc. Same for the code. So this is a kind of normative theory of doc, not what one might expect in reality. People only have a finite amount of time for any project. You might want to take a look at the doc in R parts of his example with the same eye.

First, there’s a scaffolding example which has the classic problem of documentation just for the sake of documentation.

vector[N] mu; // declaring a mu vector

You see the same thing in the final example where “vector[N] y” is documented as “y vector”. For the same reason, I don’t like this from an early example,

  sigma ~ exponential(1); // using an exponential prior on sigma

And this is what I meant by documenting the language.

mu[i] = x[i] * beta; // * is matrix multiplication in this context

and

  cholesky_factor_corr[K] L_p; // Cholesky correlation matrix

Going back to the final example, rather than “population sigma”, I would prefer “error scale” as it does not rely on the conventional notation sigma to pick out the scale.

The comment for z says “non-centered version of beta_p”, but the non-centered variable here is z_p. The terminology of “centering” is around the population mean, not zero.

Continuing with doc for z, I don’t understand what it means to be a version of beta_p. There is no beta_p in the model, so maybe some external doc? In the definition of mu, you can see beta acting as the location of the non-centered parameterization.

Did anyone spot the bug in this model? This is the real reason we don’t trust doc and have to read the code. It only works for K = 2. You’ll see a hard-coded 1 and 2 on the line defining mu[i] despite other parts of the program using K. My advice in this situation is to just bite the bullet and code the K = 2 case first. Then generalize later if you need to. I code the general case in the next section.

I want to emphasize that I’m not trying to pick on Will Hipson here. I’m assuming his intent was to be pedagogical, as the comment density drops as the models get more complicated. And the code is really good—better than in our User’s Guide.

This example also emphasizes why code review is so useful—a second set of eyes is the top thing I can recommend for improving code. Just knowing your code will be reviewed helps you write better code.

A suggestion on how to improve the broader impacts statement requirement for AI/ML papers

This is Jessica. Recall that in 2020, NeurIPS added a requirement that authors include a statement of ethical aspects and future societal consequences extending to both positive and negative outcomes. Since then, requiring broader impact statements in machine learning papers has become a thing.

The 2024 NeurIPS call has not yet been released, but in 2023 authors were required to complete a checklist where they had to respond to the following: “If appropriate for the scope and focus of your paper, did you discuss potential negative societal impacts of your work?”, with either Y, N, or N/A with explanation as appropriate. More recently, ICML introduced a requirement that authors include impact statements in submitted papers: “a statement of the potential broader impact of their work, including its ethical aspects and future societal consequences. This statement should be in a separate section at the end of the paper (co-located with Acknowledgements, before References), and does not count toward the paper page limit.”

ICML provided authors who didn’t feel they had much to say the following boiler-plate text:

“This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.”  

but warned authors to “to think about whether there is content which does warrant further discussion, as this statement will be apparent if the paper is later flagged for ethics review.”

I find this slightly amusing in that it sounds like what I would expect authors to be thinking even without an impact statement: This work is like, so impactful, for society at large. It’s just like, really important, on so many levels. We’re out of space unfortunately, so we’ll have to leave it at that.\newline\newline\newline\newline Love, \newline\newline\newline\newline the authors \newline\newline\newline\newline

I have an idea that might increase the value of the exercises, both for authors and those advocating for the requirements: Have authors address potential impacts in the context of their discussion of related work *with references to relevant critical work*, rather than expecting them to write something based on their own knowledge and impressions (which is likely to be hard for many authors for reasons I discuss below).  In other words, treat the impact statement as another dimension of contextualizing one’s work against existing scholarship, rather than a free-form brainstorm.

Why do I think this could be an improvement?  Here’s what I see as the main challenges these measures run into (both my own thoughts and those discussed by others):  

  1. Lack of incentives for researchers to be forthright about possible negative implications of their work, and consequently a lack of depth in the statements they write. Having them instead find and cite existing critical work on ethical or societal impacts doesn’t completely reconcile this, but presumably the critical papers aren’t facing quite the same incentives to say only the minimum amount. I expect it is easier for the authors to refer to the kind of critiques that ethics experts think are helpful than it is for them to write such critical reflections themselves.
  2. Lack of transparency around how impacts statements factor into reviews of papers. Authors perceive reviewing around impacts statements as a black box, and have responded negatively to the idea that their paper could potentially get rejected for not sufficiently addressing broader impacts. But authors have existing expectations about the consequences for not citing some relevant piece of prior work.
  3. Doubts about whether AI/ML researchers are qualified to be reflecting on the broader impacts of their work. Relative to say, the humanities, or even areas of computer science that are closer to social science, like HCI, it seems pretty reasonable to assume that researchers submitting machine learning papers are less likely to gravitate to and be skilled at thinking about social and ethical problems, but skilled at thinking about technical problems. Social impacts of technology require different sensibilities and training to make progress on (though I think there are also technical components to these problems as well, which is why both sides are needed). Why not acknowledge this by encouraging the authors to first consult what has been said by experts in these areas, and add their two cents only if there are aspects of the possible impacts or steps to be taken to address them (e.g., algorithmic solutions) that they perceive to be unaddressed by existing scholarship? This would better acknowledge that just any old attempt to address ethics is not enough (consider, e.g., Gemini’s attempt not to stereotype, which was not an appropriate way to integrate ethical concerns into the tech). It would also potentially encourage more exchange between what currently can appear to be two very divided camps of researchers.
  4. Lack of established processes for reflecting on ethical implications in time to do something about them (e.g., choose a different research direction) in tech research. Related work is often one of the first sections to be written in my experience, so at least those authors who start working on their paper in advance of the deadline might have a better chance of acknowledging potential problems and adjusting their work in response. I’m less convinced that this will make much of a difference in many cases, but thinking about ethical implications early is part of the end goal of requiring broader impacts statements as far as I can tell, and my proposal seems more likely to help than hurt for that goal.

The above challenges are not purely coming from my imagination. I was involved in a couple survey papers led by Priyanka Nanayakkara on what authors said in NeurIPS broader impacts statements, and many contained fairly vacuous statements that might call out buzzwords like privacy or fairness but didn’t really engage with existing research. If we think it’s important to properly understand and address potential negative societal impacts of technology, which is the premise of requiring impacts statements to begin with, why expect a few sentences that authors may well be adding at the last minute to do this justice? (For further evidence that that is what’s happening in some cases, see e.g., this paper reporting on the experiences of authors writing statements). Presumably the target audience of the impact statements would benefit from actual scholarship on the societal implications over rushed and unsourced throwing around of ethical-sounding terms. And the authors would benefit from having to consult what those who are investing the time to think through potential negative consequences carefully have to say.

Some other positive byproducts of this might be that the published record does a better job of pointing awareness to where critical scholarship needs to be further developed (again, leading to more of a dialogue between the authors and the critics). This seems critical, as some of the societal implications of new ML contributions will require both ethicists and technologists to address. And those investing the time to think carefully about potential implications should see more engagement with their work among those building the tools.

I described this to Priyanka, who also read a draft of this post, and she pointed out that an implicit premise of the broader impact requirements is that the authors are uniquely positioned to comment on the potential harms of their work pre-deployment. I don’t think this is totally off base (since obviously the authors understand the work at a more detailed level than most critics), but to me it misses a big part of the problem: that of misaligned incentives and training (#1, #3 above). It seems contradictory to imply that these potential consequences are not obvious and require careful reflection AND that people who have not considered them before will be capable of doing a good job at articulating them.

At the end of the day, the above proposal is an attempt to turn an activity that I suspect currently feels “religious” for many authors into something they can apply their existing “secular” skills to. 

Blog is adapted to laptops or desktops, not to smartphones or pads.

Sean Manning writes:

People behave differently on the post-2008 Internet than before because most of them are on smartphones or pads not laptops or desktops. For example, its hard to copy and paste blocks of text on a touchscreen, but usually easy to make screenshots, so people move things from one site to another as screenshots. Its hard to jump precisely around a text and type punctuation marks, so its hard to enter bbcode. Its easy to scroll, so sites designed for smartphones often have an infinite scroll. Its easy to pull out a smartphone in breaks from other activities, so people visiting the Internet on a smartphone are often in a hurry. People do more of what their tools encourage (affordances) and less of what their tools discourage.

Good point! I hadn’t thought if it that way, partly I guess because I don’t have a mobile phone or pad, so I do very little interaction with touchscreens.

A few years ago someone contacted me with a proposal to fix up the blog and make it more friendly to mobile devices, but it wasn’t clear to me that these changes would actually work. Or, to put it another way, it seemed that any changes would either be too minor to make a difference, or so major that they wouldn’t work with the sort of content we have here. What I hadn’t thought about was Manning’s point, that the way we write and interact on this blog is in some ways a function of how we interact with it on the computer.

There probably are some ways of making the blog more mobile-friendly, but I guess the real point is that the style of communication we’ve developed here works for this format. Kinda like how some stories work better as movies, some as TV shows, and some as plays. You can transfer from one medium to another but they’re different.

Free online book by Bruno Nicenboim, Daniel Schad, and Shravan Vasishth on Bayesian inference and hierarchical modeling using brms and Stan

Shravan points us to these materials:

Hierarchical models are bread and butter stuff for psycholinguists, so we are trying hard to make Stan/brms mainstream through various means. Teaching this stuff feels like the most important work I am doing right now, more important even than the scientific side of things.

We have chapters on hierarchical modeling in our book (to be published soon with CRC Press), we use both brms and Stan:

https://vasishth.github.io/bayescogsci/book/ [edit: made it a live link]

The online version will remain available for free. Comments/corrections are welcome; one can open issues: https://github.com/vasishth/bayescogsci/issues

This summer, I [Shravan] am teaching an intro to Bayes using brms/Stan, with a focus on hierarchical modeling, especially directed at researchers in linguistics who do experimental work:

https://www.mils.ugent.be/courses/module-9-bayesian-data-analysis/

Plus, at Potsdam, for the last seven years I have been running an annual summer school on stats for linguistics and psych, where our focus is on hierarchical modeling using Stan/brms:

https://vasishth.github.io/smlp2024/

Here, we teach both frequentist and Bayesian approaches to hierarchical modeling.

Cool! Good to have these resources out there.

Tutorial on varying-intercept, varying-slope multilevel models in Stan, from Will Hipson

I was teaching varying-intercept, varying-slope multilevel models, and . . . I can get them to fit in Stan, but the code is kinda ugly, so I was struggling to clean it up, with no success. This will be a real research project, to add appropriate functions and possibly expand the Stan language so that these models can be written at a higher, more intuitive level.

Varying-intercept models aren’t so bad. In lme4 or blme or rstanarm or brms, you write something like:

y ~ 1 | group + x + z + x:z

and that transfers pretty directly into Stan. Just create the X matrix and go from there. Indeed, you can add as many batches of varying coefficients and it’s no biggie to code it up.

But once you get to varying intercepts and slopes, it all changes. In lme4 or blme or rstanarm or brms, you can just write things like:

y ~ (1 + z | group) + x + z + x:z

But if you want to program this directly in Stan, once you have varying intercepts and slopes, you have to deal with covariance-matrix decompositions and arrays of coefficient vectors, and it’s all a hairy mess.

What to do? For this semester’s class, Imma just gonna go with lme4/blme/rstanarm when fitting varying-intercept, varying-slope models. All this Stan coding is a rabbit hole that’s getting us away from the goal, which is to be able to fit, use, and evaluate statistical models for measurement and variation.

I would like to be able to more easily fit these in Stan, though. Why, you might ask? If we can fit them in lme4, or blme for more stability, or rstanarm for including more uncertainty in the inferences, then why bother coding directly in Stan?

The answer for why we want to code directly in Stan is that we’re often wanting to expand our models, for example adding mixture components, measurement error terms, time series or spatial dependence, etc.

For that reason, you will want to be able to code varying-intercept, varying-slope models in Stan—even if I won’t be teaching that in class this semester.

The good news is that I did some googling and found this tutorial by Will Hipson on programming hierarchical regressions in Stan. It’s from 2020 and I have not looked at every line of code there, but it all looks reasonable and there’s lots of explanation of the workflow. So maybe this is the best place to start, if you want to go in this direction, as you should!

On the border between credulity and postmodernism: The case of the UFO’s-as-space-aliens media insiders

I came across this post from Tyler Cowen:

From an email I [Cowen] sent to a well-known public intellectual:

I think the chance that the bodies turn out to be real aliens is quite low.

But the footage seems pretty convincing, a way for other people to see what…sources have been telling me for years. [Everyone needs to stop complaining that there are no photos!]

And to think it is a) the Chinese, b) USG secret project, or…whatever…*in Mexico* strains the imagination.

It is interesting of course how the media is not so keen to report on this. They don’t have to talk about the aliens, they could just run a story “The Mexican government has gone insane.” But they won’t do that, and so you should update your mental model of the media a bit in the “they are actually pretty conservative, in the literal sense of that term, and quite readily can act like a deer frozen in the headlights, though at some point they may lurch forward with something ill-conceived.”

Many of you readers are from Christian societies, or you are Christian. But please do not focus on the bodies! I know you are from your early upbringing “trained” to do so, even if you are a non-believer. Wait until that evidence is truly verified (and I suspect it will not be). Focus on the video footage.

In any case, the Mexican revelations [sic] mean this issue is not going away, and perhaps this will force the hand of the USG to say more than they otherwise would have.

The above-linked post seems ridiculous to me, while comments on the post are much more reasonable—I guess it’s not hard to be reasonable when all you have to do is laugh at a silly hoax.

From a straight-up econ point of view I guess it makes sense that there has been a continuing supply of purported evidence for space aliens: there’s a big demand for this sort of thing so people will create some supply. It’s disappointing to me to see someone as usually-savvy as Cowen falling for this sort of thing, but (a) there’s some selection bias, as I’m not writing about all the people out there who have not been snookered by this Bermuda triangle ancient astronauts Noah’s ark fairies haunted radios bigfoot ESP ghosts space aliens stuff.

Given my earlier post on news media insiders getting all excited about UFOs (also this), you won’t be surprised to hear that I’m annoyed by Cowen’s latest. It’s just so ridiculous! Amusingly, his phrasing, “I think the chance that the bodies turn out to be real aliens is quite low,” echoes that of fellow contrarian pundit Nate Silver, who wrote, “I’m not saying it’s aliens, it’s almost definitely not aliens.” Credit them for getting the probability on the right side of 50%, but . . . c’mon.

As I wrote in my earlier posts, what’s noteworthy is not that various prominent people think that UFO’s might be space aliens—as I never tire of saying in this context, 30% of Americans say they believe in ghosts, which have pretty much the same basis in reality—; rather, what’s interesting is that they feel so free to admit this belief. I attribute this to a sort of elite-media contagion: Ezra Klein and Tyler Cowen believe the space aliens thing is a possibility, they’re smart guys, so other journalists take it more seriously, etc. Those of us outside the bubble can just laugh, but someone like Nate Silver is too much of an insider and is subject to the gravitational pull of elite media, twitter, etc.

Mark Palko offers a slightly different take, attributing the latest burst of elite credulity to the aftereffects of a true believer who managed to place a few space-aliens-curious stories into the New York Times, which then gave the story an air of legitimacy etc.

The space aliens thing is interesting in part because it does not seem strongly connected to political polarization. You’ve got Cowen on the right, Klein on the left, and Silver on the center-left. OK, just three data points, but still. Meanwhile, Cowen gets a lot of far-right commenters, but most of the commenters to his recent post are with me on this one, just kind of baffled that he’s pushing the story.

Postmodernism

A couple days after seeing Cowen’s post, I happened to be reading a book that discussed postmodernism in the writing of history. I don’t care so much about postmodernism, but the book was interesting; I’ll discuss it in a future post.

In any case, here’s the connection I saw.

Postmodernism means different things to different people, but one of its key tenets is that there is no objective truth . . . uhhhh, let me just “do a wegman” here and quote wikipedia:

Postmodernism is an intellectual stance or mode of discourse which challenges worldviews associated with Enlightenment rationality dating back to the 17th century. Postmodernism is associated with relativism and a focus on the role of ideology in the maintenance of economic and political power. Postmodernists are “skeptical of explanations which claim to be valid for all groups, cultures, traditions, or races, and instead focuses on the relative truths of each person”. It considers “reality” to be a mental construct. Postmodernism rejects the possibility of unmediated reality or objectively-rational knowledge, asserting that all interpretations are contingent on the perspective from which they are made; claims to objective fact are dismissed as naive realism.

One thing that struck me about Cowen’s post was not just that he’s sympathetic to the space-aliens hypothesis; also it seems to bug him that the elite news media isn’t covering it more widely. Which is funny, because it bugs me that the media (including Bloomberg columnist Cowen) are taking it as seriously as they do!

Cowen writes, “It is interesting of course how the media is not so keen to report on this.” Doesn’t seem so interesting to me! My take is that most people in the media have some common sense and also have some sense of the history of this sort of nexus of hoaxes and credibility, from Arthur Conan Doyle onward.

The postmodernism that I see coming from Cowen is in statement, “the footage seems pretty convincing, a way for other people to see what . . . sources have been telling me for years,” which seems to me, as a traditional rationalist or non-postmodernist, to be a form of circular reasoning saying that something is real because people believe in it. Saying “this issue is not going away” . . . I mean, sure, astrology isn’t going away either! Unfortunately, just about nothing ever seems to go away.

Oppositionism

There’s something else going on here, it’s hard for me to put my finger on, exactly . . . something about belief in the occult as being oppositional, something “they” don’t want you do know about, whether “they” is “the media” or “the government” or “organized religion” or “the patriarchy” or “the medical establishment” or whatever. As we discussed in an earlier post on a topic, one interesting thing is how things happen that push certain fringe beliefs into a zone where it’s considered legitimate to take them seriously. As a student of public opinion and politics, I’m interested not just in who has these beliefs and why, but also in the processes by which some such beliefs but not others circulate so that they seem perfectly normal to various people such as Cowen, Silver, etc., in the elite news media bubble.

Hand-drawn Statistical Workflow at Nelson Mandela

In September 2023 I taught a week-long course on statistical workflow at the Nelson Mandela African Institution of Science and Technology (NM-AIST), a public postgraduate research university in Arusha, Tanzania established in 2009.

NM-AIST – CENIT@EA

The course was hosted by Dean Professor Ernest Rashid Mbega and the Africa Centre for Research, Agricultural Advancement, Teaching Excellence and Sustainability (CREATES) through the Leader Professor Hulda Swai and Manager Rose Mosha.

Our case study was an experiment on the NM-AIST campus designed and implemented by Dr Arjun Potter and Charles Luchagula to study the effects of drought, fire, and herbivory on growth of various acacia tree species. The focus was pre-data workflow steps, i.e. experimental design. The goal for the week was to learn some shared statistical language so that scientists can work with statisticians on their research.

Together with Arjun and Charles, with input from Drs Emmanuel Mpolya, Anna Treydte, Andrew Gelman, Michael Betancourt, Avi Feller, Daphna Harel, and Joe Blitzstein, I created course materials full of activities. We asked participants to hand-draw the experimental design and their priors, working together with their teammates. We also did some pencil-and-paper math and some coding in R.

Course participants were students and staff from across NM-AIST. Over the five days, between 15 and 25 participants attended on a given day.

Using the participants’ ecological expertise, we built a model to tell a mathematical story of how acacia tree height could vary by drought, fire, herbivory, species, and plot location. We simulated parameters and data from this model, e.g. beta_fire = rnorm(n = 1, mean = -2, sd = 1) then simulated_data …= rnorm(n, beta_0 + beta_fire*Fire +… beta_block[Block], sd_tree). We then fit the model to the simulated data.

Due to difficulty in manipulating fire, fire was assigned at the block-level, whereas drought and herbivory were assigned at the sub-block level. We saw how this reduced precision in estimating the effect of fire:

We redid the simulation assuming a smaller block effect and saw improved precision. This confirmed the researcher’s intuitions that they need to work hard to reduce the block-to-block differences.

To keep the focus on concepts not code, we only simulated once from the model. A full design analysis would include many simulations from the model. In Section 16.6 of ROS they fix one value for the parameters and simulate multiple datasets. In Gelman and Carlin (2014) they consider a range of plausible parameters using prior information. Betancourt’s workflow simulates parameters from the prior.

Our course evaluation survey was completed by 14 participants. When asked “which parts of the class were most helpful to you to understand the concepts?”, respondents chose instructor explanations, drawings, and activities as more helpful than the R code. However, participants also expressed eagerness to learn R and to analyze the real data in our next course.

The hand-drawn course materials and activities were inspired by Brendan Leonard’s illustrations in Bears Don’t Care About Your Problems and I Hate Running and You Can Too. Brendan wrote me,

I kind of think hand-drawing stuff makes it more fun and also maybe less intimidating?

I agree.

More recently, I have been reading Introduction to Modern Causal Inference by Alejandro Schuler and Mark van der Laan, who say

It’s easy to feel like you don’t belong or aren’t good enough to participate…

yup.

To deal with that problem, the voice we use throughout this book is informal and decidedly nonacademic…Figures are hand-drawn and cartoonish.

I’m excited to return to NM-AIST to continue the workflow steps with the data that Dr Arjun Potter and Charles Luchagula have been collecting. With the real data, we can ask: is our model realistic enough to achieve our scientific goals ?

When Steve Bannon meets the Center for Open Science: Bad science and bad reporting combine to yield another ovulation/voting disaster

The Kangaroo with a feather effect

A couple of faithful correspondents pointed me to this recent article, “Fertility Fails to Predict Voter Preference for the 2020 Election: A Pre-Registered Replication of Navarrete et al. (2010).”

It’s similar to other studies of ovulation and voting that we’ve criticized in the past (see for example pages 638-640 of this paper.

A few years ago I ran across the following recommendation for replication:

One way to put a stop to all this uncertainty: preregistration of studies of all kinds. It won’t quell existing worries, but it will help to prevent new ones, and eventually the truth will out.

My reaction was that this was way too optimistic.The ovulation-and-voting study had large measurement error, high levels of variation, and any underlying effects were small. And all this is made even worse because they were studying within-person effects using a between-person design. So any statistically significant difference they find is likely to be in the wrong direction and is essentially certain to be a huge overestimate. That is, the design has a high Type S error rate and a high Type M error rate.

And, indeed, that’s what happened with the replication. It was a between-person comparison (that is, each person was surveyed at only one time point), there was no direct measurement of fertility, and this new study was powered to only be able to detect effects that were much larger than would be scientifically plausible.

The result: a pile of noise.

To the authors’ credit, their title leads right off with “Fertility Fails to Predict . . .” OK, not quite right, as they didn’t actually measure fertility, but at least they foregrounded their negative finding.

Bad Science

Is it fair for me to call this “bad science”? I think this description is fair. Let me emphasize that I’m not saying the authors of this study are bad people. Remember our principle that honesty and transparency are not enough. You can be of pure heart, but if you are studying a small and highly variable effect using a noisy design and crude measurement tools, you’re not going to learn anything useful. You might as well just be flipping coins or trying to find patterns in a table of random numbers. And that’s what’s going on here.

Indeed, this is one of the things that’s bothered me for years about preregistered replications. I love the idea of preregistration, and I love the idea of replication. These are useful tools for strengthening research that is potentially good research and for providing some perspective on questionable research that’s been done in the past. Even the mere prospect of preregistered replication can be a helpful conceptual tool when considering an existing literature or potential new studies.

But . . . if you take a hopelessly noisy design and preregister it, that doesn’t make it a good study. Put a pile of junk in a fancy suit and it’s still a pile of junk.

In some settings, I fear that “replication” is serving a shiny object to distract people from the central issues of measurement, and I think that’s what’s going on here. The authors of this study were working with some vague ideas of evolutionary psychology, and they seem to be working under the assumption that, if you’re interested in theory X, that the way to science is to gather some data that have some indirect connection to X and then compute some statistical analysis in order to make an up-or-down decision (“statistically significant / not significant” or “replicated / not replicated”).

Again, that’s not enuf! Science isn’t just about theory, data, analysis, and conclusions. It’s also about measurement. It’s quantitative. And some measurements and designs are just too noisy to be useful.

As we wrote a few years ago,

My criticism of the ovulation-and-voting study is ultimately quantitative. Their effect size is tiny and their measurement error is huge. My best analogy is that they are trying to use a bathroom scale to weigh a feather—and the feather is resting loosely in the pouch of a kangaroo that is vigorously jumping up and down.

At some point, a set of measurements is so noisy that biases in selection and interpretation overwhelm any signal and, indeed, nothing useful can be learned from them. I assume that the underlying effect size in this case is not zero—if we were to look carefully, we would find some differences in political attitude at different times of the month for women, also different days of the week for men and for women, and different hours of the day, and I expect all these differences would interact with everything—not just marital status but also age, education, political attitudes, number of children, size of tax bill, etc etc. There’s an endless number of small effects, positive and negative, bubbling around.

Bad Reporting

Bad science is compounded by bad reporting. Someone pointed me to a website called “The National Pulse,” which labels itself as “radically independent” but seems to be an organ of the Trump wing of the Republican party, and which featured this story, which they seem to have picked up from the notorious sensationalist site, The Daily Mail:

STUDY: Women More Likely to Vote Trump During Most Fertile Point of Menstrual Cycle.

A new scientific study indicates women are more likely to vote for former President Donald Trump during the most fertile period of their menstrual cycle. According to researchers from the New School for Social Research, led by psychologist Jessica L Engelbrecht, women, when at their most fertile, are drawn to the former President’s intelligence in comparison to his political opponents. The research occurred between July and August 2020, observing 549 women to identify changes in their political opinions over time. . . .

A significant correlation was noticed between women at their most fertile and expressing positive opinions towards former President Donald Trump. . . . the 2020 study indicated that women, while ovulating, were drawn to former President Trump because of his high degree of intelligence, not physical attractiveness. . . .

As I wrote above, I think that research study was bad, but, conditional on the bad design and measurement, its authors seem to have reported it honestly.

The news report adds new levels of distortion.

– The report states that the study observed women “to identify changes in their political opinions over time.” First, the study didn’t “observe” anyone; they conducted an online survey. Second, they didn’t identify any changes over time: the women in the study were surveyed only once!

– The report says something about “a significant correlation” and that “the study indicated that . . .” This surprised me, given that the paper itself was titled, “Fertility Fails to Predict Voter Preference for the 2020 Election.” How do you get from “fails to predict” to “a significant correlation”? I looked at the journal article and found the relevant bit:

Results of this analysis for all 14 matchups appear in Table 2. In contrast to the original study’s findings, only in the Trump-Obama matchup was there a significant relationship between conception risk and voting preference [r_pb (475) = −.106, p = .021] such that the probability of intending to vote for Donald J. Trump rose with conception risk.

Got it? They looked at 14 comparisons. Out of these, one of these was “statistically significant” at the 5% level. This is the kind of thing you’d expect to see from pure noise, or the mathematical equivalent, which is a study with noisy measurements of small and variable effects. The authors write, “however, it is possible that this is a Type I error, as it was the only significant result across the matchups we analyzed,” which I think is still too credulous a way to put it; a more accurate summary would be to say that the data are consistent with null effects, which is no surprise given the realistic possible sizes of any effects in this very underpowered study.

The authors of the journal article also write, “Several factors may account for the discrepancy between our [lack of replication of] the original results.” They go on for six paragraphs giving possible theories—but never once considering the possibility that the original studies and theirs were just too noisy to learn anything useful.

Look. I don’t mind a bit of storytelling: why not? Storytelling is fun, and it can be a good way to think about scientific hypotheses and their implications. The reason we do social science is because we’re interested in the social world; we’re not just number crunchers. So I don’t mind that the authors had several paragraphs with stories. The problem is not that they’re telling stories, it’s that they’re only telling stories. They don’t ever reflect that this entire literature is chasing patterns in noise.

And this lack of reflection about measurement and effect size is destroying them! They went to all this trouble to replicate this old study, without ever grappling with that study’s fundamental flaw (see kangaroo picture at the top of this post). Again, I’m not saying that they authors are bad people or that they intend to mislead; they’re just doing bad, 2010-2015-era psychological science. They don’t know better, and they haven’t been well served by the academic psychology establishment which has promoted and continues to promote this sort of junk science.

Don’t blame the authors of the bad study for the terrible distorted reporting

Finally, it’s not the authors’ fault that their study was misreported by the Daily Mail and that Steve Bannon associated website. “Fails to Predict” is right there in the title of the journal article. If clickbait websites and political propagandists want to pull out that p = 0.02 result from your 14 comparisons and spin a tale around it, you can’t really stop them.

The Center for Open Science!

Science reform buffs will enjoy these final bits from the published paper:

“Not once in the twentieth century . . . has a single politician, actor, athlete, or surgeon emerged as a first-rate novelist, despite the dismayingly huge breadth of experience each profession affords.”

Tom Bissell writes:

Recently, in The Spooky Art, Norman Mailer [wrote that] Not once in the twentieth century . . . has a single politician, actor, athlete, or surgeon emerged as a first-rate novelist, despite the dismayingly huge breadth of experience each profession affords. For better or worse, and I am prepared to admit worse, writers are writers are writers. This explains why so many mediocre fiction writers sound the same, why there exist so many books about writers, and why many talented fiction writers seem to think that their best option to distinguish themselves is to flee the quotidian to explore more fanciful subject matter.

That’s an interesting point. Here in the twenty-first century, novel writing is a niche art and a niche business. In the previous century, though, the novel was a major popular art form, and lots of people were motivated to write them, both for artistic and financial reasons. Great novels were written in the twentieth century by people with all sorts of social backgrounds, high, low, and various steps in between—George Orwell was a police officer!—, but I think Mailer was right, that none of these great novels were written by politicians, actors, athletes, or surgeons. Perhaps the closest candidate is Michael Crichton (not a surgeon but he was trained as a doctor; no great novels but he did write Jurassic Park, which was solid genre fiction). Had his novels not been successful, it seems likely he would’ve just become a doctor, which indicates a bit of selection bias in Mailer’s statement. Jim Bouton authored the literary classic Ball Four, but it’s not a novel and presumably the writing was mostly done by his coauthor, who was a professional writer. OK, I guess my best shots on this are George V. Higgins (author of some arguably-great novels (see also here) and also a practicing lawyer) and Scott Turow (also a practicing lawyer as well as an author of several excellent legal thrillers which, ok, they’re not great novels but they have a lot of strengths, I guess I’d say they’re better than Michael Crichton’s even if they don’t have the originality of someone like Jim Thompson). But “lawyer” is not quite the same category as “politician, actor, athlete, or surgeon”—indeed, a lawyer is already a sort of professional fiction writer.

I dunno, it’s an interesting question. I assume there were a fair number of twentieth-century politicians, actors, athletes, and surgeons who had the capacity to write a great novel, or at least make a fair attempt, but it doesn’t seem to have happened. Maybe it would just have taken too much effort, to the extent that, had they gone all-in to write a great novel or a reasonable attempt, they would’ve just become full-time writers and that what’s we’d remember them as. I’m not sure.

Gore Vidal was a politician (kind of) and wrote some excellent novels, maybe they don’t count as “great” but maybe they do. He’s the closest match I can think of—but maybe not, because he was a writer before going into politics, so he doesn’t really count as a politician emerging as a novelist.

P.S. Bissell’s article also discusses the idea of writers being outsiders, which motivates me to point to these two posts:

There often seems to be an assumption that being in the elite and being an outsider are mutually exclusive qualities, but they’re not.

The insider-outsider perspective

As I wrote in the comments to one of those posts:

Saying someone is an outsider doesn’t convey much information, given that just about anyone can grab that label. However, as an observer of politics (and science), I notice that people sometimes highlight their outsider status, and as a political scientist I find that interesting. For example, what’s interesting about Steven Levitt in Freakonomics is not so much that he thinks of himself as a “rogue” but that he decided to label himself that way. Rather than presenting himself as an informant from the inside, he presented himself as an outsider. He had the choice of taking either tack, and he decided on the outsider label. That’s interesting.

Why would people want economics advice from a “rogue” outsider who thinks that drunk walking is more dangerous than drunk driving, thinks we are assured of 30 years of global cooling, and believes that beautiful parents are 36% more likely to have girls? Wouldn’t you prefer economics advice from an insider, someone with a Harvard and MIT education who’s now the William B. Ogden Distinguished Service Professor of Economics at the University of Chicago? That’s what baffles me.

The outsider-novelist thing is more clear, in that different authors offer different perspectives. We read Jack London for one thing and Jane Austen for another.

“Science as Verified Trust”

Interesting post by Sean Manning:

There seems to be a lot of confusion about the role of trust in science or scholarship. Engineers such as Bill Nye and political propagandists throw around the phrase “trust the science”! On the other hand, the rationalists whom I mentioned last year brandish the Royal Society’s motto nullius in verba “Take nobody’s word for it” like a sword. I [Manning] think both sides are working from some misconceptions about how science or scholarship work. . . .

What makes this scientific or scholarly is not that you do every step yourself. It is that every step of the argument has been checked by multiple independent people, so in most cases you can quickly see if those people disagree and then trust those preliminary steps. Science or scholarship is not about heroes who know every skill, its about systems of questioning and verification which let us provisionally assume that some things are true while we focus on something where we are not sure of the answer. . . .

The New York Young Republican Club

This story hit the news yesterday:

Rep. Marjorie Taylor Greene (R-Georgia) entertained Republicans in Manhattan Saturday night with a range of one-liners trolling the political left on hot-button topics.

“I want to tell you something, if Steve Bannon and I had organized that, we would have won. Not to mention, we would’ve been armed,” she said of her role at the Jan. 6, 2021 attack on the US Capitol. . . .

The controversial congresswoman was one of several high-profile conservative firebrands . . . at the annual event hosted by the New York Young Republican Club.

Her speech took a strange turn while she noted how “you can pick up a butt plug or a dildo at Target nowadays” . . .

I was curious so I did a quick search:

OK, yeah, I guess she’s right!

Anyway, this all reminded me that I spoke at the Young Republican Club once! It was in February, 2009, and I talked about our book, Red State Blue State. It was a mellow occasion. As I recall they told me they were looking forward to an upcoming softball game with the Young Democrats. I remember telling the Young Republicans that I’d recently given a talk at the Princeton Club of New York, and that 50 years earlier there would’ve been a big overlap between the Princeton Club and the Young Republican Club but not anymore.

It seems that the organization has changed even more in the past decade or so. I can’t imagine speaking at a club where they joke about armed overthrow of the government. That really bothers me. I guess that’s how they were talking 55 years ago at the Young Communist Club, or the Students for a Democratic Society.

Stepping back, we can understand this as part of the residue of a couple hundred years of 1776 rhetoric. If it was ok for Sam Adams, George Washington, etc., to have an armed insurrection against the British, and if it was ok to have a bunch of slaveowners have an armed insurrection against the U.S. government in 1861, then what exactly is wrong with modern-day congressmembers talking about shooting up the Capitol building? Once you accept the idea that Joe Biden and Abraham Lincoln are worse than George III, the rest all follows. From that perspective, it makes me wonder why there isn’t more of this sort of talk in public. Ultimately I guess it’s more of a pragmatic issue than a moral issue. It’s against the law to threaten to shoot people, so keep talking like that and you might go to jail. Also most voters aren’t into the whole insurrection thing, so if you’re a politician and you’re not in a safe seat, this sort of extremism could be politically risky. But from a theoretical perspective, sure, if 1776 or 1861 is the standard, then, yeah, shooting at government officials could be considered to be just fine.

There’s a difference. King George was not elected by the public; Abraham Lincoln and Joe Biden were. But, once you accept the idea that overthrowing the government is OK, I guess it’s no big deal if guns are involved.

Back in 2009, nobody at the Young Republican Club was talking about hijacking Congress. Or butt plugs, for that matter. At least not on the day I was there. Things have changed.

Why we say that honesty and transparency are not enough:

Someone recently asked me some questions about my article from a few years ago, Honesty and transparency are not enough. I thought it might be helpful to summarize why I’ve been promoting this idea.

The central message in that paper is that reproducibility is great, but if a study is too noisy (with the bias and variance of measurements being large compared to any persistent underlying effects), that making it reproducible won’t solve those problems. I wrote it for three reasons:

(a) I felt that reproducibility (or, more generally, “honesty and transparency”) were being oversold, and I didn’t want researchers to think that just cos they drink the reproducibility elixir, that their studies will then be good. Reproducibility makes it harder to fool yourself and others, but it does not turn a hopelessly noisy study into good science.

(b) Lots of research are honest and transparent in their work but still do bad research. I wanted to be able to say that the research is bad without that implying that I think they are being dishonest.

(c) Conversely, I was concerned that, when researchers heard about problems with bad research by others, they would think that the people who are doing that bad research are cheating in some way. This leads to the problem of researchers saying to themselves, “I’m honest, I don’t ‘p-hack,’ so my research can’t be bad.” Actually, though, lots of people do research that’s honest, transparent, and useless! That’s one reason I prefer to speak of “forking paths” rather than “p-hacking”: it’s less of an accusation and more of a description.

Michael Lewis.

I just read this interesting review by Patrick Redford of the new book by journalist Michael Lewis on Sam Bankman-Fried, the notorious crypto fraudster.

We discussed earlier how the news media, including those such as Michael Lewis and Tyler Cowen who play the roles of skeptics in the media ecosystem, were not just reporters of the crypto fraud; they also played an important part in promoting and sustaining the bubble. As I wrote when all this came out, the infrastructure of elite journalism was, I think, crucial to keeping the bubble afloat. Sure, crypto had lots of potential just from rich guys selling to each other and throwing venture capital at it, and suckers watching Alex Jones or whatever investing their life savings, but elite media promotion took it to the next level.

We’ve talked earlier about the Chestertonian principle that extreme skepticism is a form of credulity, an idea that seems particularly relevant to the comedian and political commentator Joe Rogan, whose twin stances of deep skepticism and deep credulity are inextricably intertwined. To be skeptical about the moon landing or the 2020 election requires belief in all sorts of ridiculous theories and discredited evidence. Skepticism and credulity here are not opposites—we’re not talking “horseshoe theory”—; rather, they’re the same thing. Skepticism of the accepted official view that the moon landings actually happened, or that the laws of physics are correct and ghosts don’t exist, or that UFOs are not space aliens, or that Joe Biden won the 2020 election by 7 million votes, is intimately tied to active belief in some wacky theory or unsubstantiated or refuted empirical claim.

I’m not saying that skepticism is always a form of credulity, just that sometimes it is. When I was skeptical of the Freakonomics-endorsed claim that beautiful parents are 36% more likely to have girls, no credulity was required, just some background in sex-ratio statistics and some basic understanding of statistics. Similarly if you want to be skeptical of the claim that UFOs are space aliens etc. There’s ordinary skepticism and credulous skepticism. Ordinary skepticism, though, it easy to come by. Credulous skepticism, by its nature, is a more unstable quantity and requires continuing effort—you have to carefully protect your skeptical beliefs and keep them away from any stray bits of truth that might contaminate them. Which I guess is one reason that people such as Rogan who have the ability to do this with a straight face are so well compensated.

But what about Michael Lewis? Like everybody else, I’m a fan of Moneyball. I haven’t read any of his other books—I guess a lot of his books are about rich financial guys, and, while I know the topic is important, it’s never interested me so much—but then last year he interviewed me for a podcast! I was kinda scared at first—my previous experiences with journalists reporting on scientific controversies have been mixed, and I didn’t want to be walking into a trap—but it worked out just fine. Lewis was straightforward, with no hidden agenda. The podcast worked out just fine. Here’s a link to the podcast, and here’s an article with some background. Perhaps I should’ve been more suspicious given that the podcast is produced by a company founded by plagiarism-defender

Of Lewis’s new book, Redford writes:

A common thematic thread, perhaps the common thread, wending throughout Michael Lewis’s bibliography is the limits of conventional wisdom. His oeuvre is stuffed with stories about the moments at which certain bedrock ideas—ones about finance, or baseball, or electoral politics—crumble under their own contradictions. This is helped along, often, by visionary seers—like Michael Burry, Billy Beane, or John McCain—who put themselves in position to take advantage of those who are either too blinkered or too afraid to see the unfamiliar future taking shape in front of them.

That describes Moneyball pretty accurately, and at first it would seem to fit a podcast called “Against the Rules,” but actually that podcast was all about how there was existing expertise, settled enough to be considered “conventional wisdom,” that was swept aside in a wave of confusion. In particular, Lewis talked about the Stanford covid crew, a group of well-connected iconoclasts in the Billy Beane mode, but he showered them with criticism, not praise. Maybe that podcast worked because he was going against type? I don’t know.

Just speaking in general terms, we shouldn’t ignore the visionary seers—Bill James, imperfect though he may have been, was really on to something, and even his missteps are often interesting. But we can’t assume the off-the-beaten-path thinkers are always right: that way lies Freakonomics-style madness, as here and here.

It’s too bad what happened to Lewis with the Bankman-Fried thing, but I wouldn’t attribute it to a general problem with his take on conventional wisdom. It’s more of an omnipresent risk of journalism, which is to frame a story around a hero, which creates problems if the hero isn’t even a plausible anti-hero. (Recall that an “anti-hero” is not the opposite of a hero; rather, he’s someone who doesn’t look or act like a conventional hero but still is a hero in some sense.)

Scientific publishers busily thwarting science (again)

This post is by Lizzie.

I am working with some colleagues on how statistical methods may affect citation counts. For this, we needed to find some published papers. So this colleague started downloading some. And their university quickly showed up with the following:

Yesterday we received three separate systematic downloading warnings from publishers Taylor & Francis, Wiley and UChicago associating the activity with [… your] office desktop computer’s address. As allowed in our licenses with those publishers, they have already blocked access from that IP address and have asked us to investigate.

Unfortunately, all of those publishers specifically prohibit systematic downloading for any purpose, including legitimate bibliometric or citation analysis.

Isn’t that great? I review for all of these companies for free, in rare cases I pay them to publish my papers, and then they use all that money to do this? Oh, and the university library signed a contract so now they pay someone to send these emails… that’s just great. I know we all know this is a depressing cabal, but this one surprised me.

In other news, this photo is from my (other) colleague’s office, where I am visiting for a couple days.

Hey! A new (to me) text message scam! Involving a barfing dog!

Last year Columbia changed our phone system so now we can accept text messages. This can be convenient, and sometimes people reach me that way.

But then the other day this text came in:

And, the next day:

Someone’s dog has been vomiting, and this person is calling from two different numbers—home and work, perhaps? That’s too bad! I hope they reach the real Dr. Ella before the dog gets too sick.

Then this:

And now I started getting suspicious. How exactly does someone get my phone as a wrong number for a veterinarian? I’ve had this work number for over 25 years! It could be that someone typed in a phone number wrong. But . . . how likely is it that two unrelated people (the owner of a sick dog and the seller of veterinary products) would mistype someone’s number in the exact same way on the exact same day?

Also, “Dr. Ella”? I get that people give their doctors nicknames like that, but in a message to the office they would use the doctor’s last name, no?

Meanwhile, these came in:

Lisa, Ella, whatever. Still it seemed like some kinda mixup, and I had no thought that it might be a scam until I came across this post from Max Read, “What’s the deal with all those weird wrong-number texts?”, which answered all my questions.

Apparently the veterinarian, the yachts, and all the rest, are just a pretext to get you involved in a conversation where the scammers then befriend you before stealing as much of your money as they can. Kinda mean, huh? Can’t they do something more socially beneficial, like do some politically incorrect p-hacking or something involving soup bowls or paper shredders? Or just plagiarize a book about giraffes?

When do we expect conformal prediction sets to be helpful? 

This is Jessica. Over on substack, Ben Recht has been posing some questions about the value of prediction bands with marginal guarantees, such as one gets from conformal prediction. It’s an interesting discussion that caught my attention since I have also been musing about where conformal prediction might be helpful. 

To briefly review, given a training data set (X1, Y1), … ,(Xn, Yn), and a test point (Xn+1, Yn+1) drawn from the same distribution, conformal prediction returns a subset of the label space for which we can make coverage guarantees about the probability of containing the test point’s true label Yn+1. A prediction set Cn achieves distribution-free marginal coverage at level 1 − alpha when P(Yn+1 ∈ Cn(Xn+1)) >= 1 − alpha for all joint distributions P on (X, Y). The commonly used split conformal prediction process attains this by adding a couple of steps to the typical modeling workflow: you first split the data into a training and calibration set, fitting the model on the training set. You choose a heuristic notion of uncertainty from the trained model, such as the softmax values–pseudo-probabilities from the last layer of a neural network–to create a score function s(x,y) that encodes disagreement between x and y (in a regression setting these are just the residuals). You compute q_hat, the ((n+1)(1-alpha))/n quantile of the scores on the calibration set. Then given a new instance x_n+1, you construct a prediction set for y_n+1 by including all y’s for which the score is less than or equal to q_hat. There are various ways to achieve slightly better performance, such as using cumulative summed scores and regularization instead.

Recht makes several good points about limitations of conformal prediction, including:

—The marginal coverage guarantees are often not very useful. Instead we want conditional coverage guarantees that hold conditional on the value of Xn+1 we observe. But you can’t get true conditional coverage guarantees (i.e., P(Yn+1 ∈ Cn(Xn+1)|Xn+1 = x) >= 1 − alpha for all P and almost all x) if you also want the approach to be distribution free (see e.g., here), and in general you need a very large calibration set to be able to say with high confidence that there is a high probability that your specific interval contains the true Yn+1.

—When we talk about needing prediction bands for decisions, we are often talking about scenarios where the decisions we want to make from the uncertainty quantification are going to change the distribution and violate the exchangeability criterion. 

—Additionally, in many of the settings where we might imagine using prediction sets there is potential for recourse. If the prediction is bad, resulting in a bad action being chosen, the action can be corrected, i.e., “If you have multiple stages of recourse, it almost doesn’t matter if your prediction bands were correct. What matters is whether you can do something when your predictions are wrong.”

Recht also criticizes research on conformal prediction as being fixated on the ability to make guarantees, irrespective of how useful the resulting intervals are. E.g., we can produce sets with 95% coverage with only two points, and the guarantees are always about coverage instead of the width of the interval.

These are valid points, worth discussing given how much attention conformal prediction has gotten lately. Some of the concerns remind me of the same complaints we often hear about traditional confidence intervals we put on parameter estimates, where the guarantees we get (about the method) are also generally not what we want (about the interval itself) and only actually summarize our uncertainty when the assumptions we made in inference are all good, which we usually can’t verify. A conformal prediction interval is about uncertainty in a model’s prediction on a specific instance, which perhaps makes it more misleading to some people given that it might not be conditional on anything specific to the instance. Still, even if the guarantees don’t stand as stated, I think it’s difficult to rule out an approach without evaluating how it gets used. Given that no method ever really quantifies all of our uncertainty, or even all of the important sources of uncertainty, the “meaning” of an uncertainty quantification really depends on its use, and what the alternatives considered in a given situation are. So I guess I disagree that one can answer the question “Can conformal prediction achieve the uncertainty quantification we need for decision-making?” without considering the specific decision at hand, how we are constructing the prediction set exactly (since there are ways to condition the guarantees on some instance-specific information), and how it would be made without a prediction set. 

There are various scenarios where prediction sets are used without a human in the loop, like to get better predictions or directly calibrate decisions, where it seems hard to argue that it’s not adding value over not incorporating any uncertainty quantification. Conformal prediction for alignment purposes (e.g., control the factuality or toxicity of LLM outputs) seems to be on the rise. However I want to focus here on a scenario where we are directly presenting a human with the sets. One type of setting where I’m curious whether conformal prediction sets could be useful are those where 1) models are trained offline and used to inform people’s decisions, and 2) it’s hard to rigorously quantify the uncertainty in the predictions using anything the model produces internally, like softmax values which can be overfit to the training sample.

For example, a doctor needs to diagnose a skin condition and has access to a deep neural net trained on images of skin conditions for which the illness has been confirmed. Even if this model appears to be more accurate than the doctor on evaluation data, the hospital may not be comfortable deploying the model in place of the doctor. Maybe the doctor has access to additional patient information that may in some cases allow them to make a better prediction, e.g., because they can decide when to seek more information through further interaction or monitoring of the patient. This means the distribution does change upon acting on the prediction, and I think Recht would say there is potential for recourse here, since the doctor can revise the treatment plan over time (he lists a similar example here). But still, at any given point in time, there’s a model and there’s a decision to be made by a human.    

Giving the doctor information about the model’s confidence in its prediction seems like it should be useful in helping them appraise the prediction in light of their own knowledge. Similarly, giving them a prediction set over a single top-1 prediction seems potentially preferable so they don’t anchor too heavily on a single prediction. Deep neural nets for medical diagnoses can do better than many humans in certain domains while still having relatively low top-1 accuracy (e.g., here). 

A naive thing to do would be to just choose some number k of predictions from the model we think a doctor can handle seeing at once, and show the top-k with softmax scores. But an adaptive conformal prediction set seems like an improvement in that at least you get some kind of guarantee, even if it’s not specific to your interval like you want. Set size conveys information about the level of uncertainty like the width of a traditional confidence interval does, which seems more likely to be helpful for conveying relative uncertainty than holding set size constant and letting the coverage guarantee change (I’ve heard from at least one colleague who works extensively with doctors that many are pretty comfortable with confidence intervals). We can also take steps toward the conditional coverage that we actually want by using an algorithm that calibrates the guarantees over different classes (labels), or that achieves a relaxed version of conditional coverage, possibilities that Recht seems to overlook. 

So while I agree with all the limitations, I don’t necessarily agree with Recht’s concluding sentence here:

“If you have multiple stages of recourse, it almost doesn’t matter if your prediction bands were correct. What matters is whether you can do something when your predictions are wrong. If you can, point predictions coupled with subsequent action are enough to achieve nearly optimal decisions.” 

It seems possible that seeing a prediction set (rather than just a single top prediction) will encourage a doctor to consider other diagnoses that they may not have thought of. Presenting uncertainty often has _some_ effect on a person’s reasoning process, even if they can revise their behavior later. The effect of seeing more alternatives could be bad in some cases (they get distracted by labels that don’t apply), or it could be good (a hurried doctor recognizes a potentially relevant condition they might have otherwise overlooked). If we allow for the possibility that seeing a set of alternatives helps, it makes sense to have a way to generate them that give us some kind of coverage guarantee we can make sense of, even if it gets violated sometimes. 

This doesn’t mean I’m not skeptical of how much prediction sets might change things over more naively constructed sets of possible labels. I’ve spent a bit of time thinking about how, from the human perspective, prediction sets could or could not add value, and I suspect its going to be nuanced, and the real value probably depends on how the coverage responds under realistic changes in distribution. There are lots of questions that seem worth trying to answer in particular domains where models are being deployed to assist decisions. Does it actually matter in practice, such as in a given medical decision setting, for the quality of decisions that are made if the decision-makers are given a set of predictions with coverage guarantees versus a top-k display without any guarantees? And, what happens when you give someone a prediction set with some guarantee but there are distribution shifts such that the guarantees you give are not quite right? Are they still better off with the prediction set or is this worse than just providing the model’s top prediction or top-k with no guarantees? Again, many of the questions could also be asked of other uncertainty quantification approaches; conformal prediction is just easier to implement in many cases. I have more to say on some of these questions based on a recent study we did on decisions from prediction sets, where we compared how accurately people labeled images using them versus other displays of model predictions, but I’ll save that for another post since this is already long. 

Of course, it’s possible that in many settings we would be better using some inherently interpretable model for which we no longer need a distribution-free approach. And ultimately we might be better off if we can better understand the decision problem the human decision-maker faces and apply decision theory to try to find better strategies  rather than leaving it up to the human how to combine their knowledge with what they get from a model prediction. I think we still barely understand how this occurs even in high stakes settings that people often talk about.

Stabbers gonna stab — fraud edition

One of the themes of Dan Davies’s book, Lying for Money, was that fraudsters typically do their crimes over and over again, until they get caught. And then, when they are released from prison, they do it again. This related to something I noticed in the Theranos story, which was that the fraud was in open sight for many years and the fraudsters continued to operate in the open.

Also regarding that interesting overlap of science and business fraud, I noted:

There seem to have been two ingredients that allowed Theranos to work. And neither of these ingredients involved technology or medicine. No, the two things were:

1. Control of the narrative.

2. Powerful friends.

Neither of these came for free. Theranos’s leaders had to work hard, for long hours, for years and years, to maintain control of the story and to attract and maintain powerful friends. And they needed to be willing to lie.

The newest story

Ben Mathis-Lilley writes:

On Wednesday, the Department of Justice announced that it has arrested a 48-year-old Lakewood, New Jersey, man named Eliyahu “Eli” Weinstein on charges of operating, quote, “a Ponzi scheme.” . . . How did authorities know that Weinstein was operating a Ponzi scheme? For one thing, he allegedly told associates, while being secretly recorded, that he had “Ponzied” the money they were using to repay investors. . . . Weinstein is further said to have admitted while being recorded that he had hidden assets from federal prosecutors. (“I hid money,” he is said to have told his conspirators, warning them that they would “go to jail” if anyone else found out.) . . .

These stories of “least competent criminals” are always fun, especially when the crime is nonviolent so you don’t have to think too hard about the victims.

What brings this one to the next level is the extreme repeat-offender nature of the criminal:

There was also one particular element of Weinstein’s background that may have alerted the DOJ that he was someone to keep an eye on—namely, that he had just been released from prison after serving eight years of a 24-year sentence for operating Ponzi schemes. More specifically, Weinstein was sentenced to prison for operating a Ponzi scheme involving pretend real estate transactions, then given a subsequent additional sentence for operating a second Ponzi scheme, involving pretend Facebook stock purchases, that he conducted after being released from custody while awaiting trial on the original charges.

Kinda like when a speeding driver runs over some kid and then it turns out the driver had 842 speeding tickets and the cops had never taken away his car, except in this case there’s no dead kid and the perp had already received a 24-year prison sentence.

How is it that he got out after serving only 8 years, anyway?

In January 2021, Weinstein was granted clemency by President Donald Trump at the recommendation of, among others, “the lawyer Alan Dershowitz,” who has frequently been the subject of news coverage in recent years for his work representing Trump and his relationship with the late Jeffrey Epstein.

Ahhhhh.

This all connects to my items #1 and 2 above.

The way Weinstein succeeded (to the extent he could be considered a success) at fraud was control of the narrative. And he got his get-out-of-jail-free card from his powerful friends. “Finding your roots,” indeed.

Stabbers gonna stab

This all reminded me of a story that came out in the newspaper a few decades ago. Jack Henry Abbott was a convicted killer who published a book while in prison. Abbott’s book was supposed to be very good, and he was subsequently released on parole with the support of various literary celebrities including Norman Mailer. Shortly after his release, Abbott murdered someone else and returned to prison, where he spent the rest of his life.

The whole story was very sad, but what made it particularly bizarre was that Abbott’s first murder was a stabbing, his second murder was a stabbing, and his most prominent supporter, Mailer, was notorious for . . . stabbing someone.

A gathering of the literary critics: Louis Menand and Thomas Mallon, meet Jeet Heer

Marshall McLuhan: The environment is not visible. It’s information. It’s electronic.

Normal Mailer: Well, nonetheless, nature still exhibits manifestations which defy all methods of collecting information and data. For example, an earthquake may occur, or a tidal wave may come in, or a hurricane may strike. And the information will lag critically behind our ability to control it.

Regular readers will know that I’m a big fan of literary criticism.  See, for example,

“End of novel. Beginning of job.”: That point at which you make the decision to stop thinking and start finishing

Contingency and alternative history (followup here)

Kazin to Birstein to a more general question of how we evaluate people’s character based on traits that might, at least at first glance, appear to be independent of character (followup here)

“Readability” as freedom from the actual sensation of reading

Things that I like that almost nobody else is interested in

Anthony West’s literary essays

I recently came across a book called “Sweet Lechery: Reviews, Essays and Profiles,” by literary journalist Jeet Heer. The “Lechery” in the title is a bit misleading, but, yes, Heer is open about sexual politics. In any case, like the best literary critics, he engages with the literary works and the authors in the context of politics and society. He has some of the overconfidence of youth—the book came out ten years ago, and some of its essays are from ten or more years before that—, and there’s a bunch of obscure Canadian stuff that doesn’t interest me, but overall I found the writing fun and the topics interesting.

One good thing about the book was its breadth of cultural concerns, including genre and non-genre literature, political writing, and comic books, with the latter taken as of interest in themselves, not merely as some sort of cultural symbol.

I also appreciated that he didn’t talk about movies or pop music. I love movies and pop music, but they’re also such quintessential topics for Boomer critics who want to show their common touch. There are enough other places where I can read about how Stevie Wonder and Brian Wilson are geniuses, that Alex Chilton is over- or under-rated, appreciation of obscure records and gritty films from the 1970s, etc.

My comparison point here is Louis Menand’s book on U.S. cold war culture from 1945-1965, which made me wonder how he decided what to leave in and what to leave out. I’m a big fan of Menand—as far as I’m concerned, he can write about whatever he wants to write about—; it was just interesting to consider all the major cultural figures he left out, even while considering the range of characters he included in that book. Heer writes about Philip Roth but also about John Maynard Keynes; he’s not ashamed to write about, and take seriously, high-middlebrow authors such as John Updike and Alice Munro, while also finding time to write thoughtfully about Robert Heinlein and Philip K. Dick. I was less thrilled with his writing about comics, not because of anything he said that struck me as wrong, exactly, but rather because he edged into a boosterish tone, promotion as much as criticism.

Another comparison from the New Yorker stable of writers is Thomas Mallon, who notoriously wrote this:

Screen Shot 2015-06-14 at 12.32.19 PM

Thus displaying his [Mallon’s] ignorance of Barry Malzberg, who has similarities with Mailer both in style and subject matter. I guess that Malzberg was influenced by Mailer.

And, speaking of Mailer, who’s written some good things but I think was way way overrated by literary critics during his lifetime—I’m not talking about sexism here, I just think there were lots of other writers of his time who had just as much to say and could say it better, with more lively characters, better stories, more memorable turns of phrase, etc.—; anyway, even though I’m not the world’s biggest Mailer fan, I did appreciate the following anecdote which appeared, appropriately enough, in an essay by Heer about Canadian icon Marshall McLuhan:

Connoisseurs of Canadian television should track down a 1968 episode of a CBC program called The Summer Way, a highbrow cultural and political show that once featured a half-hour debate about technology between McLuhan and the novelist Norman Mailer. . . .

McLuhan: We live in a time when we have put a man-made satellite environment around the planet. The planet is no longer nature. It’s no longer the external world. It’s now the content of an artwork. Nature has ceased to exist.

Mailer: Well, I think you’re anticipating a century, perhaps.

McLuhan: But when you put a man-made environment around the planet, you have in a sense abolished nature. Nature from now on has to be programmed.

Mailer: Marshall, I think you’re begging a few tremendously serious questions. One of them is that we have not yet put a man-made environment around this planet, totally. We have not abolished nature yet. We may be in the process of abolishing nature forever.

McLuhan: The environment is not visible. It’s information. It’s electronic.

Mailer: Well, nonetheless, nature still exhibits manifestations which defy all methods of collecting information and data. For example, an earthquake may occur, or a tidal wave may come in, or a hurricane may strike. And the information will lag critically behind our ability to control it.

McLuhan: The experience of that event, that disaster, is felt everywhere at once, under a single dateline.

Mailer: But that’s not the same thing as controlling nature, dominating nature, or superseding nature. It’s far from that. Nature still does exist as a protagonist on this planet.

McLuhan: Oh, yes, but it’s like our Victorian mechanical environment. It’s a rear-view mirror image. Every age creates as a utopian image a nostalgic rear-view mirror image of itself, which puts it thoroughly out of touch with the present. The present is the enemy.

That’s great! I love how McLuhan keeps saying these extreme but reasonable-sounding things and then, each time, Mailer brings him down to Earth. Norman Mailer, who built much of a career on bloviating philosophizing, is the voice of reason here. The snippet that I put at the top of this post is my favorite: McLuhan as glib Bitcoin bro, Mailer as the grizzly dad who has to pay the bills and fix the roof after the next climate-induced hurricane.

Heer gets it too, writing:

It’s a measure of McLuhan’s ability to recalibrate the intellectual universe that in this debate, Mailer—a Charlie Sheen–style roughneck with a history of substance abuse, domestic violence, and public mental breakdowns—comes across as the voice of sobriety and sweet reason.

Also, Heer’s a fan of Uncle Woody!