I [Mark Lieberman] was mostly trying to see whether a new database search program was working. I knew that men have been said to use filled pauses like “uh” more than women, and it made sense to me that disfluency would increase with age, so I generated the data for the first plot and took a look. I think you’re right that I should have started the plot from 0, but I wasn’t sure what I’d see, and thought that the qualitative effects if any would be clearer with a narrower range of values plotted.

Then I wondered about “um”, and still had a few minutes, so I ginned up the data for the second plot and took a look at it. I was quite surprised to see the opposite age effect, and somewhat surprised to see the inverted sex effect, so I quickly looked up the standard papers on the subject and banged out a post.

Actually what I did was to add a bit of verbiage around the .html notes (with embedded graphs) that I’d been making for myself.

I’ve attached the first plot that I made in that session, showing the female/male ratio for a number of words that I thought might show a difference. The X axis is the (log) count of the word (mean of counts for male and female speakers), and the y axis is the (log) ratio of female/male counts. The plotted words are too small, but I wasn’t sure how much they would overlap…

If I can find another spare hour or two, I’m going to check out whether southerners really talk slower than northeners.

P.S. In his new plots (see here), Mark uses a 2×2 grid and extends the y-axis to 0. To be really picky, I'd suggest making 0 a "hard boundary." In R you can do this using 'yaxs="i"' in the plot() call, but then the top boundary will be "hard" also, so that you have to use ylim to extend the range (e.g., ylim=c(0,1.05*max(y))).