How many people do you know in prison?”: Using overdispersion in count data to estimate structure in social networks

Posted on April 1, 2005 12:37 AM by Andrew

I’ll be speaking at Harvard next Monday on some joint work with Tian Zheng, Matt Salganik, Tom DiPrete, and Julien Teitler:

Networks–sets of objects connected by relationships–are important in a number of fields. The study of networks has long been central to sociology, where researchers have attempted to understand the causes and consequences of the structure of relationships in large groups of people. Using insight from previous network research, McCarty, Bernard, Killworth, et al. (1998, 2001) developed and evaluated a method for estimating the sizes of hard-to-count populations using network data collected from a simple random sample of Americans. In this paper we show how, using a multilevel overdispersed Poisson regression model, these data can also be used to estimate aspects of social structure in the population. Our work goes beyond most previous research by using variation as well as average responses to learn about social networks and leads to some interesting results. We apply our method to the McCarty et al. data and find that Americans vary greatly in their number of acquaintances. Further, Americans show great variation in propensity to form ties to people in some groups (e.g., males in prison, the homeless, and American Indians), but little variation for other groups (e.g., people named Michael or Nicole). We also explore other features of these data and consider ways in which survey data can be used to estimate network structure.

Our paper is here. And here’s a paper by McCarty, Killworth, Bernard, Johnsen, and Shelley describing some of their work that we used as a starting point. (They estimate average network size at 290 but we get an estimate, using their data, of 750. The two estimates differ in corresponding to different depths of the social network.) McCarty et al. were very collegial in sharing their data with us, which we reanalyzed using a multilevel model. Here’s a presentation I found on the web from Killworth on this stuff.

Update: Our paper will appear in the Journal of the American Statistical Association.