Skip to content

Visualization of large datasets

Gregor Gorjanc writes,

Gentleman et al. published a paper on visualizing genomic data. There are quite some issues that can be applied to other areas of data visualization. I particulary like the scatterplot examples on page 17. I [Gregor] often have massive datasets and it is hard to see anything there. smoothScatter from geneplotter R package can help a lot in producing more informative and eye candy graphs. Try the following (from smootScatter help page). And my examples–unfortunatelly not in English, but graphs show some context.

library(“geneplotter”) ## you need additionally annotate and Biobase
## from BioC and RColorBrewer
if(interactive()) {
x1 <- matrix(rnorm(1e4), ncol=2) x2 <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2) x <- rbind(x1,x2) layout(matrix(1:4, ncol=2, byrow=TRUE)) smoothScatter(x, nrpoints=0) smoothScatter(x) smoothScatter(x, nrpoints=Inf, colramp=colorRampPalette(RColorBrewer::brewer.pal(9, "YlOrRd")), bandwidth=40) colors <- densCols(x) plot(x, col=colors, pch=20) }


  1. Martin Theus says:


    you might be interested in the book "Graphics of Large Data Sets" which is due this summer. (I hate advertising my own work, but …) There are many real world examples which might be helpful when dealing with large data.

    There are also some slides of a talk I gave some years ago you might like.

  2. Gregor says:

    Martin, thanks!

    Any pointers to good works are welcome.