Brett Pelham (whose research on names and life choices is discussed here and here–based on the work of Pelham and his collaborators, we crudely estimate that about 1% of people in the U.S. choose a career based on their first name) wrote a quick email in response.

Brett writes:

You show that if you meet a dentist named Dennis, the chances that he chose his career based on his name is pretty high–even though very, very few dentists choose their careers based on their names. By the way, one way of getting the kind of more general estimate you referred to would be to look at decisions whewre it’s easier to look at lots of single letter overlaps. In our 2004 paper on close relationships, we show, in lots of very big samples, that people are disproportionately likely to marry others whose surnames begin with the same letter as their own. Ignoring the fact that some of these matches involved more than one letter, I’d think you could get a nice estimate of the sort you mentioned based on those data. That paper was Jones, Pelham, Carvallo & Mirenberg, 2004, JPSP (same journal as before).

I do have one question that you might be able to answer about these kind of data. If you were to compute a simple correlation (or phi) coefficient between a person’s last name and whether the street that person lived on included that name (e.g., Jones living on Jones Avenue), the phi coefficient would obviously be miniscule because of the extreme skew in the data (99.3% of people are not named Jones. But if you pit Jones against Smith and calculate phi coefficents that way you see something like

Surname

Street Name Jones Smith

Jones 320 80

Smith 80 320which would yield a correlation of .60. (These data are simplified and based on my rough memory of the effect size, but I hope the example makes the point.) Effect size estimates are huge if you avoid the problem of skew but tiny if you don’t. I tell my students that this is why you should prefer odds ratios when data are really skewed, but I realized that I’ve never asked a stats expert about this. By the way, the real data with the surname street effects came out in JPSP (Pelham, et al., 2004) in response to an excellent statistical critique by Marcello Gallucci of our original paper.

My response to Brett:

Regarding your question on odds ratios, it seems to me that there are 2 issues going on:

(1) Using Smith and Jones gives you twice as much info than only using Jones.

(2) If you work with Jones and non-Jones, then the correlation coefficient will be tiny (although it will be statistically significant), thus giving a misleading sense of the effect size.Regarding issue 2, I think it depends on which of the “effect sizes” you care about. As noted in my blog entry on the topic, the “effect size” conditoinal on being a Jones in Jones Street is huge, but the “effect size” simply conditional on being a Jones is tiny. So maybe it depends on what you’re looking for. on either scale, it’s statiistically significant.

This seems like an interesting statistical topic–or, to put it another way, something that’s already arisen in biostatistics, I’m sure.