What to graph, not just how to graph it

Robin Gong writes:

While we’re on the topic of visualization, I’ve been puzzled by a more general question and I’m unsure where it fits in actually.

There seem to be two parts to a good visualization practice, and in our class we’ve been focusing more on one of them, that is “how to get my point across?” To me that’s a psychology question for which a recipe-type solution could exist, e.g. what choices of graph types, layouts and details are better at alleviating the cognitive burden in the audience, making the visual cue more salient thus a more effective graph. But this “how” question begins with the premise that I know “what point is to be made”, and what if I don’t? When doing EDA in high-dimensional data, how does one visualize the potentially multi-way joint dependency among variables? Or when checking the performance of a high-dim model, can I see anything (estimation, model fit, prediction) beyond uni-/bi-variately? And to throw in something inspired by immediate research, suppose one wishes to compare the posterior inference between MCMC and approximate inference methods (e.g. VB or EP), what can be done to display the loss of authenticity of the approximation, beyond the dry metrics such as KL divergence or marginal likelihood? I call them dry since they aren’t revealing of the specific problem, such what areas of the support or which dimensions are missing out the most. The challenge is that we’re exactly relying on visualization to teach ourselves about the behavior of this object (data or model) that would otherwise been impossible to learn, but if we are ignorant of the object, how to visualize it to begin with? To this end I think this is perhaps fundamentally a statistical methodology question, whose solution calls for a different kind of ingenuity; after all if I knew how to display a higher dimensional object in lower dimension with all essence captured, I would’ve found a good inferential method. People say that if an MCMC sampler doesn’t converge, chances are the model is problematic to begin with, and I wonder if it’s the same with statistical visualization: if I can’t display it properly, am I just asking the wrong questions?

My reply: There are a few interesting ideas here:

First, we do spend lots of time on the details of how to graph a particular idea or pattern of information, but not so much time on what to graph.

Second, there’s the challenge of trying to discover the unexpected in high dimensions. It’s my impression that there was a lot of research on this in the 1970s and 1980s: The statistics world was pretty small back then, and after Tukey started writing about exploratory data analysis, various people started working on ideas such as rotating point clouds, or automatically searching for interesting dimensions for graphical comparisons. My guess is that a lot of this work is still valuable and worth looking into further. The idea seems very powerful, to treat the human and the computer as a pattern-recognition team, and to have an algorithm to find projections that are worth further exploration.

Third, how to you compare in high dimensions, for example if you have multiple chains of HMC and you want to check that they’re pretty much in the same place? Steve Brooks and I did some work on this awhile ago with the multivariate potential scale reduction factor but it didn’t work so well in practice, perhaps because our goals weren’t so clear.

9 thoughts on “What to graph, not just how to graph it

  1. The essential difference is between “exploratory” analysis and “presentation.” All of the recent attention on visualization (as well as numerous commercial products) have focused on presenting visualizations of data. Using visualizations to find the insights involves statistical reasoning, visualization techniques, and subject area expertise. I don’t think there is a way to divorce this task from any of those components. As an example (more concrete than MCMC in my mind), consider comparing different classifiers – logistic regression and random forests for example. One interesting way of looking at the outputs of each is to look at the distribution of the classification probabilities. If one technique produces more variable probabilities, it is potentially more capable of distinguishing between the different observations. It may or may not do a better job of prediction, but it potentially capable of producing more differentiation between observations. Whether or not this is desirable would require further analysis.

    How do we learn or teach people to look for such things? There are principled approaches that are forms of critical thinking – look for differentiated outputs if you believe your data is diverse – divide your data over time if you are looking for changes or trends – divide your data by categories if you believe they may differ in important ways (try clustering if you are not even sure of which ways they may differ), etc. I don’t see developing such abilities as any different than learning how to do data analysis. Perhaps it is a matter of emphasis – we place too much emphasis on techniques rather than critical thinking – and that may apply to visualization in the same way it applied to other analytical techniques.

    What I have seen in my own teaching experience is that it is much easier for students (and teachers!) if assignments focus on analyzing or visualizing a “known” object – that is, we provide the “what” to visualize and focus on “how to” visualize it. When presented with open ended data and asked to explore it, students have much more difficulty. And, it is also harder for me, as instructor, to figure out how to help students. Experience and practice count for a lot. I’m sure there are general principles and approaches but my own have grown out of my own experience. More systematic approaches would be great to see. Suggestions would be appreciated.

  2. > if I can’t display it properly, am I just asking the wrong questions?
    Or just not stating the question clearly enough?

    Interesting and more generally about how to do insightful science, here though just focussing on visualization tools.

    G. E. P. Box. Science and statistics. Journal of the American Statistical Association, 71(356):791-799, 1976.

    G. E. P. Box. Sampling and Bayes’ inference in scientific modeling and robustness. Journal of the Royal Statistical Society, Series A, 143(Pt. 4):383-430, 1980.

    “in such confusions of thought as that of active willing (willing to control thought, to doubt, and to weigh reasons [being scientific]) with willing not to exert the will (willing to believe [looking for automatic solutions/rules])”
    from http://en.wikisource.org/wiki/A_Neglected_Argument_for_the_Reality_of_God

  3. My own thought is that, like The Music Man, you’ve got to know the territory. The areas in which I’ve been most successful in teasing out interesting relationships are areas in which I’ve invested an awful lot of time and effort in understanding, not just the data, but more importantly, the processes that give rise to the data. True, a statistician can often bring insight to a new field without such an investment, but I think that you get that much more insight with a deeper understanding.

    My $.02

    • There’s a physicist named Marvin Weinstein who has a thing he calls dynamic quantum clustering. He takes a bunch of data points in multidimensional space, and does some tricks to infer a smoothed distribution, which he regards as an energy distribution. He then evolves the points using some quantum-inspired algorithm that includes some chance of tunneling so the points don’t just get stuck in the tiniest local minima, but rather fall towards the cluster centers. He then watches the movie of this happening (projected into 3 PCA dimensions) and can thereby, it seems, consistently say interesting things about the problem domain, even when he knows very little about it besides the data.

      I believe that this kind of trick is possible. Statistically, he’s doing at least a half dozen things that aren’t entirely kosher, but through this black magic he’s developed a practically useful eye for multidimensional clustering; distinguishing which clusters are real and even sometimes something about the internal structure of those clusters. I’m not saying he can just come into any field and use this trick to master it (http://xkcd.com/793/); what he says may be consistently inspiring to the domain experts, but he wouldn’t know what to do with it without them. Still, even saying something inspiring is of real value.

      I guess my point is that if statisticians throw up their hands at finding general tools to explore data even if you don’t have a lot of domain knowledge, statistical outsiders — physicists or computer scientists or whoever — will fill in that gap. We shouldn’t give up. Or at least, we should be ready to think and talk about the methods that others come up with, even if it’s clear there’s no way we’ll ever prove those methods’ asymptotic consistency or whatever.

  4. Also, does one need to actually “see” the multi variable dependancies in higher dimensions? Or is it enough to be able to identify them via a test / algorithm.

    In other words, are we hunting for a procedure or a visualization per se?

  5. I’m not sure how much of this you remember, as Peter Huber left Harvard right around the time that you arrived. But before you came, he and some of his students were pretty active in this area.

    A couple of interesting ideas were floating around at the time. One was scatterplot brushing (which is still fairly common today), where you would simultaneously highlight the same data point in a number of two-d views.

    One was rotation, where you could look at various 2d slices constantly changing the viewing area. This could give the impression of 3d structures.

    The grand tour was an interesting idea that built off of that. It would randomly choose a new projection onto two dimensions and smoothly move between the current view and that one. When it reach that place, it would randomly choose another projection and move on. The analyst would stop it when it got to an interesting view.

    One was projection pursuit: basically you would look at various projections from the higher dimensional space trying to maximize some property (some measure of interestingness). Note that principle components analysis is a special case, where the property you are maximizing is variance.

    There are a number of reasons that it didn’t catch on at that particular point in time (late 70s early 80s). One was that the computer power was just not up to it. I remember maintaining a Sun workstation which had 8MB of memory (and cost a pretty penny). Before that we were working on an Apollo workstation. I remember that rotating a point cloud of about 250 points we could get around 4 frames per second: not enough to give the 3d illusion without stereo glasses.

    The second reason is that the statistician involved in the project could never quite get around the fact that we might be just chasing noise. Obviously, the purpose was hypothesis generation, not hypothesis testing, but still we felt nervous about giving a tool like this to the uninitiated. Interestingly enough, here in the College of Ed, factor analysis is what everybody does. Confirmatory factor analysis allows the researchers to work with less fear of the noise chasing problem.

    Peter Huber stumbled upon the third reason in his research. Basically, you need to do a fair amount of data preparation and cleaning before putting things into Prim-H (which was similar to ggobi). A lot of us where working on extending and testing David Donaho’s ISP stat package (which was similar to S, which was just starting to make its way out of the lab at the time).

    In current time, we have the computer scientists who haven’t really looked at this period of history. They are rather fearless when it comes to the noise chasing problem. Looking back on what was happening 30 years ago, I feel that there is a strong need to use substantive knowledge about the variables to guide the dimension reduction process. I worry that without that input it is too interesting to find the faces of celebrities in the optimal arrangement of our breakfast cereal.

    • See my comment above about DQC. It’s a thing that I think in practice does a pretty good job of finding signal without chasing too much noise, but I do realize that there’s no way to prove that except by piling up examples.

Leave a Reply to Dale Lehman Cancel reply

Your email address will not be published. Required fields are marked *