Skip to content
 

Visualizing correlations circularly

Some time ago FlowingData had an article on visualizing tables – which really is about visualizing spreadsheets in terms of correlations between columns. While Circos generates very colorful displays:

circos.png

Today I was impressed by a much cleaner and Tuftier variant on the theme by Mike Bostock, called Dependency Tree:

dependency-tree.png

Click on the link, it’s interactive. Jeff Heer and Bostock also have a new JavaScript visualization toolkit out ProtoVis, which simplifies the creation of such stuff. The computer scientist in me finds this development very cool. But I still like my correlation matrices.

6 Comments

  1. While Mike has done a good job with ProtoVis, the "Dependency Tree" visualization was originally developed by Danny Holten back in 2006. Here's the original paper: http://www.win.tue.nl/~dholten/papers/bundles_inf

  2. Michael says:

    These visualizations are beautiful but I think they are difficult to use. I would much rather use your correlation matrix.

    Are there examples of people actually using these circular diagrams for communicating specific connections among data?

    Also, it appears that these diagrams are communicating causality, not correlation.

  3. noahpoah says:

    "Tuftier" is a very nice neologism/use of derivational morphology.

  4. Russell Duhon says:

    Yeah, in this case the visualization isn't called a dependency tree, the visualization is of a dependency tree in a program.

    In this case, such visualizations are particularly useful because the relationships in the data are sparse. A correlation matrix (I assume we're discussing a visualized one, such as with a heatmap) would obscure the real relationships in the data. It is also very easy to see phantom patterns in correlation matrices due to ordering methodologies.

    For instance, in the dependency graph in question, it is immediately obvious to someone familiar with what the colors indicate where there are dependencies that go counter to the flow of the majority of dependencies for a package, indicating modularity (not in the network science sense, in the programming organization sense) problems.

    I'm actually working on a paper using this technique with a hierarchical community detection algorithm, though I've had to change the approach after the eigenfactor people scooped the general idea. You can see what they're doing, with citation patterns, here: http://well-formed.eigenfactor.org/radial.html

    Oh, another thing the visualization supports well is interaction. There's a natural interaction approach of dragging a chord across the edges of interest, highlighting them.

  5. Aleks Jakulin says:

    Justin, thanks for the link!

    Michael, the second diagram is actually dependencies between software modules, not really causality. But whenever you have a scale from -1 to 1, as you do correlation, you can apply the same visualization.

    Russel, I used hierarchical clustering with some modifications to sort my data before I put it into a matrix.

  6. jebyrnes says:

    I played around with the circular layout engine in the gplots package, placing correlation coefficients into an adjacency matrix. It worked pretty well, although it becomes difficult to contain the paths within the circle.