Skip to content
 

Power-law models for response times in correspondence: a mysterious controversy

Today in the Collective Dynamics Seminar, Gueorgi will be discussing a set of papers on the distribution of weighting times for correspondence. The topic apparently is controversial. I can’t quite see where the controversy lies, although it’s clear from reading the language of the comments and replies that some people are getting angry. I’ll first give the references, then my thoughts:

Barabasi, The origin of bursts and heavy tails in human dynamics
Oliveira & Barabasi, Human dynamics: Darwin and Einstein correspondence patterns
Vazquez, Oliveira, Dezso, Goh, Kondor, Barabasi, Modeling bursts and heavy tails in human dynamics
Stouffer, Malmgren & Amaral, Comment on Barabasi
Barabasi et al., Reply to Comment
Cosma Shalizi’s commentary
Aaron Clauset’s commentary

The short story is that Barbasi et al. have analyzed various data of time lags in personal correspondences and found that they follow a long-tailed power-law type of distribution (including some very long waits between replies). Here’s a pretty picture of waiting times for correspondence of Darwin and Einstein, following something close to a power-law distribution:

powerlaw1.jpg

Here’s another picture from Barbasi, this time of waiting times for email replies:

powerlaw2.jpg

Stouffer then has a pretty picture of his own here (sorry, I can’t figure out how to cut-and-paste it) of some email waiting times that don’t fiit a power-law model very well. There’s some discussion of Bayesian model selection here but I don’t quite see the point of that since obviously no model will be “correct” here.

This reminds me . . .

Looking at these papers reminds me of a few things. When Tian and Matt and I were working on our “How many people do you know” paper, we fit a lognormal distribution to the “gregariousness” parameters which basically represent the size of people’s social networks. Matt made an offhand comment one day that some people might be interested in seeing if a power-law distribution fit these data, but we decided that the information on the high tails was so sparse (basically, we esitmate someone as knowing 3000 people if he or she reported knowing several people named Michael, several named Nicole, etc.) that we couldn’t really try to distinguish these distributions and wouldn’t believe it if we tried. (See here for more on this point.)

My other memory relates to a different aspect of the problem. Almost 20 years ago, a friend in graduate school was working for the summer at a federal agency under the supervision of a Ph.D. statististician. My friend’s boss put him to the task of fitting normal distributions (or maybe it was lognormal distributions) to some variable (perhaps it was income; for the purpose of the story it doesn’t matter) in the general population, and then separately for men and women, and other subsets of the population. My friend countered that, if you have a normal distribution for men and a normal distribution for women, the mixture won’t be normal. But his boss didn’t want to hear it.

Similarly, I can’t see that you’d expect to see the same distributional family for all correspondence as for correspondence of a single individual, and I didn’t see this addressed in these articles. From the displayed graphs, it looked like Stouffer et al. and Barbasi et al. were analyzing different datasets, so I can’t quite see where the disagreement is coming from. There’s probably something I’m missing here.

3 Comments

  1. It should be easy to collect data on email response times: every single person who uses email has this information stored in their email archives. Individuals could download and install plugins for their email programs; alternatively, webmail providers like MSN, Yahoo, or Google have all this (privacy invasive!) info on their servers already (and Google at least seems to have the statistical talent to do this analysis!)

    Microsoft Research's Priorities system tracked user behavior to figure out which emails were most valuable; presumably response time should be a useful input variable. (another link)

    I bet a plugin that provides useful features similar to Priorities could also collect anonymous data on many users' email habits that could settle this dispute, and also provide better priors for email classification/value rating performance.

  2. Cosma says:

    "From the displayed graphs, it looked like Stouffer et al. and Barbasi et al. were analyzing different datasets, so I can't quite see where the disagreement is coming from."

    Actually, the first Barabasi paper is analyzing the same data as Stouffer et al.; it was originally gathered by Eckmann et al. for this. Darwin and Einstein are cute, but they came later.

  3. jrse says:

    Glad to see that at least one physicist has found comfortable post cold-war employment! Jeesh, I can't make heads or tails of it either, but I imagine it might take a lifetime to figure it out!