Skip to content
 

Christmas special: Survey research, network sampling, and Charles Dickens’ coincidences

It’s Christmas so what better time to write about Charles Dickens . . .

Here’s the story:

In traditional survey research we have been spoiled. If you work with atomistic data structures, a small sample looks like a little bit of the population. But a small sample of a network doesn’t look like the whole. For example, if you take a network and randomly sample some nodes, and then look at the network of all the edges connecting these nodes, you’ll get something much more sparse than the original. For example, suppose Alice knows Bob who knows Cassie who knows Damien, but Alice does not happen to know Damien directly. If only Alice and Damien are selected, they will appear to be disconnected because the missing links are not in the sample.

This brings us to a paradox of literature. Charles Dickens, like Tom Wolfe more recently, was celebrated for his novels that reconstructed an entire society, from high to low, in miniature. But Dickens is also notorious for his coincidences: his characters all seem very real but they’re always running into each other on the street (as illustrated in the map above, which comes from David Perdue) or interacting with each other in strange ways, or it turns out that somebody is somebody else’s uncle. How could this be, that Dickens’s world was so lifelike in some ways but filled with these unnatural coincidences?

My contention is that Dickens was coming up with his best solution to an unsolvable problems, which is to reproduce a network given a small sample. What is a representative sample of a network? If London has a million people and I take a sample of 100, what will their network look like? It will look diffuse and atomized because of all those missing connections. The network of this sample of 100 doesn’t look anything like the larger network of Londoners, any more than a disconnected set of human cells would look like a little person.

So to construct something with realistic network properties, Dickens had to artificially fill in the network, to create the structure that would represent the interactions in society. You can’t make a flat map of the world that captures the shape of a globe; any projection makes compromises. Similarly you can’t take a sample of people and capture all its network properties, even in expectation: if we want the network density to be correct, we need to add in links, “coincidences” as it were. The problem is, we’re not used to thinking this way because with atomized analysis, we really can create samples that are basically representative of the population. With networks you can’t.

This may be the first, and last, bit of literary criticism to appear in the Journal of Survey Statistics and Methodology.

5 Comments

  1. I am intrigued by the idea that the “coincidence” in Dickens’s novels does not distort reality but instead corrects for the distortion inherent in a sample. That is, by distorting a distortion, it captures truth. This makes a lot of sense.

  2. Slugger says:

    I am in the minority that doesn’t think very highly of Dickens. A society can be visualized as a fractal where a sample does reflect the whole. This was the approach of many novelists contemporary with Dickens whose skills seem to me greater than his. A family can expose a whole society in the hands of a Tolstoy or a Dostoyevsky, a group of coal miners reflects an economic system in the hands of a Zola, and humanity’s relationship to the mysteries of nature can be captured by a few shipmates by Melville. Dickens was immensely popular because he did write all those gratifying coincidences, but he was no Balzac nor Flaubert. The structure of a society and the relationship of man with man can be exposed by a writer focusing on a small group.
    I truly suffered reading Great Expectations as a fourteen year old and have never forgiven.

  3. Mark Palko says:

    “[H]is characters all seem very real”

    The operative word here is ‘seem.’ Magwitch and Havisham are hardly realistic characters but Dickens makes them seem believable and distinct. I think ‘vivid’ might be a better term than ‘real’.

  4. mpledger says:

    London was highly stratified by class. People socialised within class, married within class etc. You just have to look at the huge range of English accents in such a small country to see how very insular all the social groupings were.

    I don’t think we should be judging London’s social networks as if they were like our own.

  5. Keith O'Rourke says:

    Not that new a thing in art “We all know that Art is not truth. Art is a lie that makes us realize the truth, at least the truth that is given to us to understand. Pablo Picasso https://www.brainyquote.com/quotes/quotes/p/pablopicas161578.html

    Does remind me of systematic versus random sampling trade-off in numerical integration.

Leave a Reply