Two kinds of book

Posted on April 23, 2009 12:22 PM by Andrew

One of the things Brad Paley talked about the other day was the computer program he used to make a visualization of the text of Alice in Wonderland [link fixed]. (Click on the “Alice in Wonderland” link; it’s really cool.)

My first question when I saw this was, why is the book presented as a circle rather than a line? The circle places the end of the book at the same place as the beginning. There are some reasons this might make sense–after all, Alice wakes up from her dream at the very end of the book, returning to where she was at the start–but, overall, I don’t see the circularity making sense. I asked Brad during his talk, but he did not have time to respond (too many questions were being asked, a problem I’d love to have at my own talks!). He indicated that he did have a good reason, though, so if he lets me know I’ll report it here.

People asked what was the point of the TextArc display (other than it looking pretty), and Brad gave a bunch of examples of what the plot showed. In some way it was similar to some of my statistical research efforts, in that the results were impressive but ended up confirming things that made sense and that, ultimately, we already knew. In my case, my colleagues and I found that American Indians are not randomly distributed in the social network; in Brad’s case, he found that Alice is a central character in Alice in Wonderland, that the words “Mock” and “Turtle” go together, and so forth. (See here for more.)

When pressed further, Brad justified TextArc as a souped-up index. This made a lot of sense to me: his graph tells you lots of information that’s not in a conventional index and also allows you to map straight back to the original text. I agree that it’s silly to criticize the program for what it doesn’t do. It’s an automatic program and does a lot. I’m also impressed by any program written more than 5 years ago that still works!

Anyway, one of Brad’s remarks about using this tool to understand text made me think that there are two kinds of books:
1. Books that you want to read straight through, from beginning to end.
2. Books that you use for reference, flipping through and looking for what you need.
The horrible thing is that I write all my books as if they will be read from beginning to end, but I’m pretty sure most people read them as reference books. For most people–even most statisticians–reading Bayesian Data Analysis from beginning to end would be like me reading the instruction manual for my washing machine. I pick up the instruction manual when I need it, and then I look for what I need.

Anyway, I thought this might be relevant to TextArc and similar projects. Maybe Alice in Wonderland is not the best example; it might make more sense to use TextArc for a book such as Bayesian Data Analysis that has a sequence but is primarily used for reference. (I went to the TextArc site but can’t find the program; at least, there’s no easy way to feed in a book and have it produce the TextArc picture.)

9 thoughts on “Two kinds of book”

d^2 on April 23, 2009 8:55 AM at 8:55 am said:

if you could write a (statistics?) textbook where i could look up any particular (statistical) concept and learn how to use it with absolutely minimal prior (statistical) knowledge, then you'd be cooking with gas.

something like going to wikipedia and doing a search for "linear regression", but instead of reading _about_ linear regression, i get a textbook-style lesson on how to take some data, do the regression, and then interpret the results (and maybe there's an "advanced" flag you can set to add some proofs in there, too).

i imagine a blog or wiki is the ideal format for this, because it probably isn't a linear progression through the text. you could probably design the pages to assemble themselves by collecting the relevant topics (rather than following hyperlinks all over the place), with minimal extra text added for each concept.

supposing such a thing existed, this alice in wonderland mapping would probably be a decent way to look at it!
Corey on April 23, 2009 9:23 AM at 9:23 am said:

FWIW, I read BDA 2nd ed. straight through. I used it to teach myself practical Bayesian analysis — Jaynes's book convinced me that I needed to know how to do it, but didn't actually show me how.
Sam on April 23, 2009 10:12 AM at 10:12 am said:

I'm afraid you used the same URL for the first two links, consequently, I can't see this visualization you were raving about :-(
E. V. Vance on April 23, 2009 11:36 AM at 11:36 am said:

Note that the link to the Alice in Wonderland visualization is the wrong one; it's the same as the link to "the other day."

Thanks.
Anonymous on April 23, 2009 12:28 PM at 12:28 pm said:

"Alice in Wonderland" link is broken.
conchis on April 24, 2009 12:23 AM at 12:23 am said:

"why is the book presented as a circle rather than a line?"

The whole "rubber bands" concept behind the placement of the words in the middle of the circle wouldn't really work if the words were arranged linearly would it?
dumbledad on April 24, 2009 2:49 AM at 2:49 am said:

Isn't Paley's eliptical plot there so that he can place words at their 'centre of mass'? I.e. each word is attached by invisible springs to all the lines around the outer elipse that the word occurs in. If the book's lines were laid out in a staight line then the centre of mass for each word would be somewhere along the line itself, whereas Paley wants them spread across the interior of the elipse. Something closer to what you want (though still not linear) may be the tag clouds of propper nouns that Chris Harrison overlays on The Bible (look at the second half of this page http://www.chrisharrison.net/projects/bibleviz/ )
Andrew Gelman on April 24, 2009 4:30 AM at 4:30 am said:

If the book were placed on a line from left to right on the page, yes, you'd have to modify the "rubber bands" rule, but I'd think it would be pretty simple: the horizontal position of each word would be at its average position, and the vertical position would depend on how spread out the word is in the text. Something like this: words used locally would be very close to the placement of the text, and words used in more diverse places would be further away. If, say, the text were laid out horiziontally at the bottom of the display (with each line written vertically, I suppose, in the manner of the current setup), then words ss Dormouse would be pretty low down near the strip of text, whereas words such as Alice would be higher up.

Or maybe this wouldn't quite work; I'm sure Brad has tried this and other ideas. My intuition, though, is that something like this could do ok.
Keith O'Rourke on April 25, 2009 4:30 AM at 4:30 am said:

Andrew:

JG Gardin did related work on representing academic discourse (many years ago)

This reference maybe the most recent

Gardin, J.-C, (2002) Archaeological discourse, conceptual modeling and digitalization: an interim report of the logicist program, in Doerr, M., Sarris, Α. (dir.), Computer Applications and Quantitative Methods in Archaeology. Proceedings of the 30th Conference, Heraklion, Crête, avril 2002, Héraklion, Archive of monuments and publications, Hellenic Ministry of Culture, 2003, p. 5-12.

Keith
p.s. the "logicist program" was also coined by Art Dempster, Dempster after reading Gardin said it was not related to what Gardin was doing and Gardin after reading Dempster said it was (of course they will both be right)

p.s.2 one of Gardin's interesting acomplishments was to have one of his programs write an article "like" Claude Levi-Strauss would and when he presented to Levi-Strauss asking if it was one of his articles – after reading it Levi-Strauss said – yes it is but I don't seem to have a copy – do you mind if I keep this one?

Comments are closed.