Skip to content

More on Um, also on the implementation of start-at-zero

Mark Liberman replied to this entry (see also here):

I [Mark Lieberman] was mostly trying to see whether a new database search program was working. I knew that men have been said to use filled pauses like “uh” more than women, and it made sense to me that disfluency would increase with age, so I generated the data for the first plot and took a look. I think you’re right that I should have started the plot from 0, but I wasn’t sure what I’d see, and thought that the qualitative effects if any would be clearer with a narrower range of values plotted.

Then I wondered about “um”, and still had a few minutes, so I ginned up the data for the second plot and took a look at it. I was quite surprised to see the opposite age effect, and somewhat surprised to see the inverted sex effect, so I quickly looked up the standard papers on the subject and banged out a post.

Actually what I did was to add a bit of verbiage around the .html notes (with embedded graphs) that I’d been making for myself.

I’ve attached the first plot that I made in that session, showing the female/male ratio for a number of words that I thought might show a difference. The X axis is the (log) count of the word (mean of counts for male and female speakers), and the y axis is the (log) ratio of female/male counts. The plotted words are too small, but I wasn’t sure how much they would overlap…

If I can find another spare hour or two, I’m going to check out whether southerners really talk slower than northeners.

And here’s Mark’s new plot:


Here’s the full version. (I don’t know how to fit it all on the blog page.)

P.S. In his new plots (see here), Mark uses a 2×2 grid and extends the y-axis to 0. To be really picky, I’d suggest making 0 a “hard boundary.” In R you can do this using ‘yaxs=”i”‘ in the plot() call, but then the top boundary will be “hard” also, so that you have to use ylim to extend the range (e.g., ylim=c(0,1.05*max(y))). What I should really do is write a few R functions to encode my default graphing preferences so that I don’t need to do this crap every time I make a graph.


  1. Bob O'H says:

    Odd. There is a bias towards females for "uh-huh", and towards males for "uh-uh". How real is this: are they really different phrases, or is it just pronounciation?

    Oh, and wouldn't it be better to have the ratio on the log scale? It's difficult to compare the sizes of the biases in the different directions.


  2. Anonymous says:

    Ummm…that is a log scale for y, isn't it? And zero isn't included because it can't be.