Skip to content
 

What’s better than R for exploratory data plots?

I use R for just about everything, including exploratory data analysis and graphics. The only other package with which I have any familiarity is Mathematica. I’ve been generally satisfied with R graphics, although there are things that I always struggle with, such as:

  1. using expression() to get symbols and math expressions the way I want them;
  2. writing line labels at an angle so that they lie along the line (and then having to re-do this if I change the dimensions of the plot, e.g. by changing the margins);
  3. setting margins when I have multiple plots on a single figure, so that the axis labels fit but there is still enough room for the data;
  4. placing labels or legends where they don’t get in the way of the plot.

At least in my normal course of business, all of these issues only come up when I’m making publication-quality figures (or at least presentation-quality), not when I’m exploring the data or comparing the data to predictions. So I’ve always thought of R as being excellent for exploratory data analysis, and fair or poor for making publication-quality output. But sometimes I do find myself taking a lot of time on an exploratory plot (such as the example here), which is frustrating.
View image
And then a friend mentioned that he thinks R is good for publication-quality graphics — you have precise control over everything — but is terrible for exploratory graphics, which is exactly the opposite of the way I think of it! He pointed that, aside from some crude things you can do with identify(), R’s graphics are non-interactive: you can’t click to remove bad data points, or zoom in and out, or click on a line and change its color or width. He said good exploratory graphics programs let you do all of these things. But here’s the kicker: he couldn’t name a good exploratory graphics program! He says he knows they exist, but he doesn’t know what they are.

So: what’s worth a look, besides R and Mathematica? Am I using R just because it’s what I’m used to (and it’s free), or is it actually the best thing out there, as I have always assumed?

33 Comments

  1. Anonymous says:

    Check out Python's matplotlib:

    http://matplotlib.sourceforge.net/

  2. Hector says:

    I recently ran into Vaingast's "Beginning Python Visualization" and I think it's a good reference to get one started in Python's scientific extensions: Numpy, Scipy, and Matplotlib. All of these together basically provide a framework similar to Matlab. I have to admit that I don't really know how advanced it EDA abilities are (so far).

  3. Jonathan says:

    What about JMP from SAS. They've got a great dynamic graph builder – really beats anything else I've seen for visual data exploration.

    There's a video demo here: http://www.jmp.com/software/jmp8/demos/sall_graph

    The interesting part begins about halfway through!

  4. James says:

    I find that JMP is very useful for exploratory analysis.
    However, for presentation/production quality graphics requires a lot of tinkering.

  5. Vince B. says:

    Have you checked out Ggobi? http://www.ggobi.org/

  6. John Johnson says:

    There's always R with ggobi (there is even a book about it) and iplot R package, if you want to remain within the R family.

  7. Bill Harris says:

    I tend to use J for a number of exploratory graphs. The one thing I miss is the dynamic capability of the old XLISP-STAT; I'm beginning to try out GGobi for that.

  8. KW says:

    Two ideas:

    1. Spotfire

    2. ViSta (the one built on xlisp-stat).

  9. Mike Dewar says:

    I'd like to add to the reccomendation of Python's matplotlib; it's great for both exploratory analysis (matlab style) and for publication quality graphs. Combined with ipython (in –pylab mode) it makes a very neat solution. Plus with RPy it's easy to glue together with R. It's got a very extensive, well documented API that you can drill into as much or as little as you want.

    For your points:
    1) is easy, as matplotlib can just render latex directly (e.g.),
    2) sounds a bit hard!
    3) really easy to do interactivly
    4) is pretty straighforward, but there's not a way to do with the mouse afaik.

    The main downside for me is that it sucks at surfaces. Other than that it's great…

  10. Amy says:

    Take a look at JMP- exploratory analysis is one of its guiding principles.

  11. Aaron Mackey says:

    For actual exploration (i.e. I'm not sure what I'm going to find, what I need to do, etc., but I'll figure it out as I poke around), then SpotFire really can't be beat. Too bad it's commercial.

  12. shane says:

    If you want to share graphs and explore data with coworkers online, it isn't actually that hard to do it in adobe flash. I used fusion graphs (because they were used at lsn, which I liked) for these graphs, but in an ideal world I would roll my own graphing tools in something like flash/flex/flare, which would create graphs which are mostly the same, but I would have more control. Fusion graphs allows you to display your data online and interactively without learning flash, and I think that it isn't the only such program.

  13. Bruce McCullough says:

    S-Plus for exploratory graphics.

  14. Martin Theus says:

    Why don't you look at Mondrian. It supports all the EDA plots that prevailed and also offers a lot of recent development in graphical data analysis. Everything is interactive and supports a really exploratory working style.

    If you want to have guidance on how to use interactive graphics for data analysis you may find this site to be a valuable source.

  15. Drew Conway says:

    I would also like to add my endorsement for matplotlib in Python.

    It solves all four of the issues in your post.

  16. Mr. Gunn says:

    I'm going to get creamed for this, but Excel is actually kinda useful for smallish amorphous datasets that you want to get a feel for.

    ManyEyes might be useful too.

  17. Luis says:

    I still think that R is best for EDA. However, once you figure out what you want to graph, either sigmaplot or statistica could be a good option:

    http://www.sigmaplot.com/modules/products/sigmapl

    http://www.statsoft.com/uniquefeatures/graphics.h

  18. oz says:

    "I'd like to add to the reccomendation of Python's matplotlib; it's great for both exploratory analysis (matlab style) and for publication quality graphs. "

    Why not use matlab? for me it works great
    (not being open source makes it harder for me to recommend it though ..)

  19. Chris says:

    My favourite tool for visually sifting through data is Aabel. Two things to note, however: 1. it is Mac-only and 2. it isnt free.

    Aabel does a tremendous job of handling many variables, and allows you to quickly switch between graph types to get the sort of view that you need. It includes a huge range of chart types, and little of the 3D chart junk that you might expect. Its filtering capabilities are also very useful when you want to extract subsets from the data, or remove outliers.

    Again, not cheap, but its the best I've found for OSX.

  20. KMC says:

    Aabel is worth mentioning.
    http://www.gigawiz.com/

  21. Scott says:

    Good point. However Python/Numpy/RPy/ipython (plus, maybe, MayaVi for surfaces/volumes) is more likely than Matlab to appeal to someone who likes R, I think.

    Open-source is only one reason for this. The other reason is, Python's the sort of language a computer scientist would design, while Matlab's the sort an engineer would design. The esthetic criteria by which they excel are different, and I think R is closer to Python on that axis.

  22. With regard to the excel comment, I would add that Excel has some little known, but very powerful add-ins that make the program slightly more appealing. Like Dig-db, for example. A great macro.

  23. Gad Abraham says:

    Regarding placing legends, the lattice package does that nicely (see auto.key parameter to xyplot, for example).

  24. Mr Gunn and Gad Abraham:

    As a long time time advanced Excel chart user and more recent R user, I have to say that R's charting capabilities are far superior to Excel.

    R's base graphics beats all my Excel charting tricks and add-ins hands-down.

    Phil – I suggest you stick with R.

  25. Frank D says:

    Try GeoDa for spatial data and STARs (though not as intuitive) for Spatial Panels.

  26. What R packages do you use?

    I've been playing around with ggplot2 (http://had.co.nz/ggplot2/) for exploratory graphics – seems very easy to use.

    Personally, I'd be wary of a GUI-based exploratory graphics program. I imagine that it'd take longer to do anything.

  27. I like using Tableau for getting a good initial look at a data set. I find it creates very good plots by default and the interaction is very fluid and fast. If you want to get an idea of what R could do, but doesn't, Tableau is a good place to start.

    Windows only and not free, but a 30-day trial is available.

    Disclosure: my adviser is one of Tableau's founders.

  28. Ryan says:

    I may be a little late posting here, but I think R works great for everything.

    As for the "interactive" EDA, I would argue that R graphics are plenty interactive. You take a look at the data, try something, take another look at it, etc. I'm sure this isn't truly "interactive" to the folks who want something like animations and linked brushing and what-not. I have tried such systems and much prefer creating a lot of "static" R plots. The interactivity comes from changing things, making lots of plots, and being well-versed in R so that it all happens quickly. This gives me an "audit-trail" as well and helps me keep track of what I have done.

    In my opinion, EDA and model building are synonymous. Most data sets I have worked on have very quickly led me to situations where the analysis becomes too complex so that "interactive" graphics systems become too cumbersome or impossible to use. Perhaps I am just old-fashioned, though!

  29. Martin Theus says:

    Ryan,

    there is a good point to what you said, but I would argue with two points:

    Model building is far more restricting than what you do with EDA. In fact a successful EDA process should lead to good (often less complex) models.
    There are things you can do with interactive graphics which are extremely hard or extremely inefficient with a series of static plots, and often you run the risk of missing points if it takes too long to follow your ideas.

  30. apeescape says:

    I know I'm a little late, but here's one for GGobi. I haven't used it a lot, but it has some nice features to explore high dimensional data and you can also interact with it as well. The corresponding package for R is rggobi.

    iPlots is another R package that is interactive.

  31. WD says:

    Very late with this comment, but I came across this post while searching for R info.

    While buggy, there's the R package playwith for interactive GUI graphics manipulation

  32. Pbleic says:

    There is nothing like Spotfire , which was recently bought by Tibco. It is expensive, but very powerful and flexible, and is designed specifically for exploration of data. You can go from excel spreadsheet of complex data to data visualization in seconds.

  33. I prefer R, but I will also use gnuplot. If you like Matlab style plotting, you might also like free Matlab clone Octave, which uses gnuplot as a backend for its graphics.