“What do you think about curved lines connecting discrete data-points?”

John Keltz writes:

What do you think about curved lines connecting discrete data-points? (For example, here.)

The problem with the smoothed graph is it seems to imply that something is going on in between the discrete data points, which is false. However, the straight-line version isn’t representing actual events either- it is just helping the eye connect each point. So maybe the curved version is also just helping the eye connect each point, and looks better doing it.

In my own work (value-added modeling of achievement test scores) I use straight lines, but I guess I am not too bothered when people use smoothing. I’d appreciate your input.

Regular readers will be unsurprised that, yes, I have an opinion on this one, and that this opinion is connected to some more general ideas about statistical graphics.

In general I’m not a fan of the curved lines. They’re ok, but I don’t really see the point. I can connect the dots just fine without the curves.

The more general idea is that the line, whether curved or straight, serves two purposes: first, it’s an (interpolative) estimate of some continuous curve; second, it makes short-term trends apparent in a way that’s harder to see from the points alone. The straight line also serves a third purpose, which is to make clear the reliance on the original data. To put it another way, if I’m doing a simple interpolation, I want it to be clear that I’m doing a simple interpolation–and the straight lines make this clear. The curved lines, not so much. Maybe they’re from some model? To me, the added ambiguity is more of a cost than the smoother interpolation is a benefit.

8 thoughts on ““What do you think about curved lines connecting discrete data-points?”

  1. I think you may not have understood the question. My interpretation is that he's not asking about situations where observations just happen to have been taken at intervals of (say) one year, and we might try to interpolate, but rather situations where the quantity is only defined at yearly points. For instance, the number of entrants in a tournament held yearly.

    I think that even connecting the points with straight lines can then be misleading. I guess it depends on whether one can reasonably view the quantity observed as being indicative of some other quantity that actually is well-defined at intermediate points. Then connecting with lines, or even curves, might be OK. For curves, though, I'd think one would do some sort of smoothing that wouldn't lead to the curve passing exactly through the data points.

  2. I think the choice has a lot to do with how accurate the measurements are. If the individual points really do fall within a small error bar of the plotted position (like for example they are temperature or length or dollar cost or something where the measurement apparatus is relatively precise) and especially when the data are a time series of the same measurement, then smooth interpolation is much preferable.

    In these cases smoothness is much more accurately representing what happened. For example linear interpolations have step function derivatives and infinite second derivatives. It's unsurprising that in social sciences such considerations are not top in Andrew's mind, but in physical sciences the derivatives of a curve have important meaning.

  3. I am curious what you or others think of the following:

    for N data points, draw lines representing the best fit model* from either
    a) nested polynomials from of orders 0 to N-1
    or
    b) any plausible model with > N parameters, e.g. with prior constraints
    or other?

    !extra credit (what I am really interested in knowing):
    d) would your answer above change if there were replicate observations at each data point?

    *or lines, shaded proportional to prob(model_i | data) calculated following Bayes Thm or Aikake weights or such.

    Thanks in advance to all for the feedback.

  4. I am surprised to find you not much more strongly against joining them at all, let alone by curved lines.

    If I understand it right, the x-variable is nominal there.

    What is gained by giving the impression that there's something "between" categories?

    What does it mean when a curve peaks slightly to the left or right of a point?

    What does the point of inflexion mean on some of those curves?

    (My answer is "nothing", every time. But it *looks* like it might mean something at first.)

    The "information" conveyed by joining the points is misleading. The information conveyed by using curved lines doubly so.

  5. "curved" and "straight" lines… If it's curved is it a line? And if it's a line of course it's straight!

  6. I basically agree with these points, but just wanted to give another reason for the lines: in time series with dense points, it can be hard to see how the points are ordered w/out some sort of line between them. Straight lines are probably better for that use, since they're shorter.

    I haven't really seen people connect the points outside of time series.

    –Gray

Comments are closed.