What’s wrong with a kernel density?

In response to my offhand remark that the kernel densities in the article by Chen and Rodden are “tacky” and would be much better as histograms, commenter Anne asks:

What’s wrong with a kernel density? Too opaque a connection with the data? I [Anne] have had some unpleasant surprises using histograms lately, so I’ve been trying to get a feel for the alternatives.

My reply: Here are my problems with kernel densities (in this example, and more generally):

1. Annoying artifacts, such as all-positive quantities whose kernel density estimates go into the negative zone. This can be fixed, but (a) it typically isn’t, and (b) when there isn’t an obvious bound, you still have the issue of the kernel density including places that it shouldn’t.

2. It’s hard to see where the data are. As I wrote in my blog linked above, I think it’s better to just see the data directly. Especially for something like vote proportions that I can understand pretty well directly. For example, when I see the little peak at 3% in the density in Figure 2 of Chen and Rodden, or the falloff after 80%, I’d rather just see what’s happening there rather than trying to guess by taking the density estimate and mentally un-convolving the kernel.

3. The other thing I like about a histogram is that it contains the seeds of its own destruction–that is, an internal estimate of uncertainty, based on variation in the heights of the histogram bars. See here for more discussion of this point, in particular the idea that the goal of a histogram or density estimate is not to most accurately estimate the true superpopulation density (whatever that means in this example) but rather to get an understanding of what the data are telling you.

7 thoughts on “What’s wrong with a kernel density?

  1. I think it's useful to temper that view a little with some of the advantages of kernel density estimation.

    Histograms are widely used outside of the world of professional statisticians, where the problem of choosing a reasonable bin size is not well known. A KDE with a default bandwidth setting at least produces visually comprehensible charts without requiring an unlikely degree of familiarity with statistics. (Maybe the people in question shouldn't be flinging histograms around without that familiarity, but they're going to anyway.)

    In addition, for a lot of the work where I use KDEs in preference to histograms, I'm generating a handful of charts, glancing at them to see if the peaks are where I want them, then throwing them away. In these kinds of cases, the fewer knobs I have to tweak before getting a usable sense of what's going on, the better off I am.

  2. I have taken to using ecdf (Empirical CDF) in R instead of kernel densities. It summarizes the data into something like a smooth CDF line while graphing all the data points. This avoids the bin size issue on histograms and (I think) Andrew's problems with kernel densities. The trick is you have to think in terms of CDFs instead of PDFs.

  3. Kernel density plots work well when you're comparing a few distributions on a single panel, when histograms are at best confusing and at worst useless.

    If the sample size is small they will still be "shoogly", thus giving some indication of uncertainty.

  4. What about overlaying a kernal density over a histogram? This is what I do with all of my univariate distributional graphs. By doing so one gets the best of both worlds.

  5. I think I see what you're getting at. Histograms have their artifacts as well – particularly in a circular context, where the initial phase can make quite a difference – but I suppose the idea is that these limitations are obvious to the experienced eye? It's certainly true that the relationship of a kernel density estimator to the underlying data is more complicated than that of a histogram.

    I'm thinking, for example, of a single large peak that can produce either one very large bin value or two modest bin values. Perhaps there is a rule of thumb – "if any feature is less than two bins wide, use more bins"? – to avoid this problem, but it seems like all too often I don't really have enough photons get any sense out of a finer binning.

    My current approach is essentially to plot a histogram, and overplot the world's cheapest kernel density estimator: I just get the Fourier coefficients (of the unbinned data) truncate them at some plausible number of harmonics, and plot the result. (One can do better, I know, using a von Mises kernel instead of a sinc kernel, and one can also get an error band out of the estimator, but I usually keep it simple.)

  6. Well, with all due modesty, I have an algorithm that solves all of these issues. Bayesian blocks … it yields the optimal binning (not constrained to predefined bins in any sense) in the sense of maximizing a fitness function for a piecewise constant model of the data. I am currently writing an update of the obsolete paper

    J. Scargle, Studies in Astronomical Time Series Analysis. V. Bayesian Blocks, A New Method to Analyze Structure in Photon Counting Data, Astrophysical Journal 504 (1998), 405- 418.

    (the old algorithm is greedy; the new one is described in a working draft on my website), but the math is at

    An algorithm for optimal partitioning of data on an interval, Jackson, B. Scargle, J.D. Barnes, D. Arabhi, S. Alt, A. Gioumousis, P. Gwin, E. Sangtrakulcharoen, P. Tan, L. Tun Tao Tsai
    Dept. of Math., San Jose State Univ., CA, USA;
    Signal Processing Letters, IEEE
    Feb. 2005
    Volume: 12, Issue: 2
    page(s): 105- 108

  7. I presume the method you describe would be suitable also for species abundance data. A lot of theory rides on plots of number of individuals X number species which is displayed in histograms with arbitrary bin widths. Comparison with kernel plots is poor as you would expect and the kernel is also chosen so if there is a less arbitrary method this would be interesting.

Comments are closed.