Skip to content

2 quick calls

Kevin Lewis asks what I think of these:

Study 1:
Using footage from body-worn cameras, we analyze the respectfulness of police officer language toward white and black community members during routine traffic stops. We develop computational linguistic methods that extract levels of respect automatically from transcripts, informed by a thin-slicing study of participant ratings of officer utterances. We find that officers speak with consistently less respect toward black versus white community members, even after controlling for the race of the officer, the severity of the infraction, the location of the stop, and the outcome of the stop. Such disparities in common, everyday interactions between police and the communities they serve have important implications for procedural justice and the building of police-community trust.

Study 2:
Exposure to parental separation or divorce during childhood has been associated with an increased risk for physical morbidity during adulthood. Here we tested the hypothesis that this association is primarily attributable to separated parents who do not communicate with each other. We also examined whether early exposure to separated parents in conflict is associated with greater viral-induced inflammatory response in adulthood and in turn with increased susceptibility to viral-induced upper respiratory disease. After assessment of their parents’ relationship during their childhood, 201 healthy volunteers, age 18-55 y, were quarantined, experimentally exposed to a virus that causes a common cold, and monitored for 5 d for the development of a respiratory illness. Monitoring included daily assessments of viral-specific infection, objective markers of illness, and local production of proinflammatory cytokines. Adults whose parents lived apart and never spoke during their childhood were more than three times as likely to develop a cold when exposed to the upper respiratory virus than adults from intact families. Conversely, individuals whose parents were separated but communicated with each other showed no increase in risk compared with those from intact families. These differences persisted in analyses adjusted for potentially confounding variables (demographics, current socioeconomic status, body mass index, season, baseline immunity to the challenge virus, affectivity, and childhood socioeconomic status). Mediation analyses were consistent with the hypothesis that greater susceptibility to respiratory infectious illness among the offspring of noncommunicating parents was attributable to a greater local proinflammatory response to infection.

My reply:

1. I’d run this by a computational linguist who doesn’t have a stake in this example. I’m skeptical in any case because this kind of “respect” thing is contextual. I mean, sure, I believe that building of trust is important; I just don’t know if much is gained by the “extract levels of respect automatically” thing.

2. I’ll believe this one after it appears in an independent preregistered replication, not before.


  1. Computational linguist at your service.

    (1),, is a lot to digest. The authors follow an involved process of collecting and filtering data, having it transcribed, then having it rated by humans on a 1-4 scale on five dimensions (formal, friendly, impartial, polite, respectful). Then the human ratings are somehow used to train a GLM driven in part by the standard kinds of linguistic feature extraction (including a lot of curated lists of words, phrases or types chosen to be relevant and discriminative for their dimensions). They then evaluate the accuracy of their GLMs against a couple data-driven principal components. It’s not a clean generative Bayesian model, which makes it very hard to analyze using my current approach to model criticism. What I would’ve liked to have seen is some cross-validation within the training set on the accuracy and biases of the GLMs (classifiers)—maybe it’s there somewhere between the paper and appendix and I missed it. The reason this matters is that if the classifiers have systematic biases, we want to adjust for them in drawing conclusions about population prevalences. It’s standard operating procedure for diagnostic testing in epidemiology.

    The authors then apply their GLM to the whole set of around N1 = 700 black, N2 = 300 white transcribed stops. They then present a lot of tables in the reject-the-zero-effect p < 0.001 tradition and consider alternative models pointing out they still produce p < 0.001 effects.

    I’d be curious as to what result they’d have gotten from having their original data raters spread out over a few more transcripts coupled with a noisy measurement model like Dawid and Skene’s. It would answer Andrew’s questions about how much of the linguistic content was useful in drawing the conclusion in their title (“language from police body camera footage shows racial disparities”).

    One thing I didn’t see them adjust for is how respectful, etc., the person being stopped was. I seem to recall there being an effect of speakers trying to match each other’s tone, tempo, etc.—no idea what the thinking on that is these days. Again, maybe I missed it.

    The sociolinguistic discussion sections in the paper and methodology appendix is fascinating and highlights just how challenging this kind of modeling is.

  2. Mason says:

    > I’ll believe this one after it appears in an independent preregistered replication, not before.

    How am I supposed to adjust my beliefs when a small non-preregistered study comes out? A little bit? Not at all? In the other direction?

  3. Wonks Anonymous says:

    I didn’t think an IRB would allow people to be experimentally exposed to a virus, even if it was just the common cold.

  4. Martha (Smith) says:

    ” I’m skeptical in any case because this kind of “respect” thing is contextual.”

    In particular, what one person considers “respect” may not be the same as what another person considers “respectP.

Leave a Reply