It’s Christmas so what better time to write about Charles Dickens . . .
In traditional survey research we have been spoiled. If you work with atomistic data structures, a small sample looks like a little bit of the population. But a small sample of a network doesn’t look like the whole. For example, if you take a network and randomly sample some nodes, and then look at the network of all the edges connecting these nodes, you’ll get something much more sparse than the original. For example, suppose Alice knows Bob who knows Cassie who knows Damien, but Alice does not happen to know Damien directly. If only Alice and Damien are selected, they will appear to be disconnected because the missing links are not in the sample.
This brings us to a paradox of literature. Charles Dickens, like Tom Wolfe more recently, was celebrated for his novels that reconstructed an entire society, from high to low, in miniature. But Dickens is also notorious for his coincidences: his characters all seem very real but they’re always running into each other on the street (as illustrated in the map above, which comes from David Perdue) or interacting with each other in strange ways, or it turns out that somebody is somebody else’s uncle. How could this be, that Dickens’s world was so lifelike in some ways but filled with these unnatural coincidences?
My contention is that Dickens was coming up with his best solution to an unsolvable problems, which is to reproduce a network given a small sample. What is a representative sample of a network? If London has a million people and I take a sample of 100, what will their network look like? It will look diffuse and atomized because of all those missing connections. The network of this sample of 100 doesn’t look anything like the larger network of Londoners, any more than a disconnected set of human cells would look like a little person.
So to construct something with realistic network properties, Dickens had to artificially fill in the network, to create the structure that would represent the interactions in society. You can’t make a flat map of the world that captures the shape of a globe; any projection makes compromises. Similarly you can’t take a sample of people and capture all its network properties, even in expectation: if we want the network density to be correct, we need to add in links, “coincidences” as it were. The problem is, we’re not used to thinking this way because with atomized analysis, we really can create samples that are basically representative of the population. With networks you can’t.
This may be the first, and last, bit of literary criticism to appear in the Journal of Survey Statistics and Methodology.