Skip to content

Hey, what’s up with that x-axis??

Screen Shot 2015-04-20 at 1.17.52 PM

CDC should know better.

P.S. In comments, Zachary David supplies this correctly-scaled version:


It would be better to label the lines directly than to use a legend, and the y-axis is off by a factor of 100, but I can hardly complain given that he just whipped this graph up for us.

The real point is that, once the x-axis is scaled correctly, the shapes of the curves change! So that original graph really was misleading, in that it incorrectly implies a ramping up in the 3-10 year range.

P.P.S. Zachary David sent me an improved version:


Ideally the line labels would be colored so there’d be no need for the legend at all, but at this point I really shouldn’t be complaining.


  1. Yeah that’s pretty bad. It looks like that’s how they did every graph in that paper.

    Apparently it’s just a graphical representation of the table on Page 55 where each x-axis tick corresponds to a column.

    I used the data from the table and remade the graph as it should look:

    • Eric Loken says:

      Thanks for that! Opposite curvature at the 5-year inflection. A different visual experience, and a different interpretation of risk over time.

      • You’re welcome! The curvature difference for sure intuits a completely different interpretation.

        (now if I could just remind myself to format percentages correctly for visualizations. It’s probably been 5 years since I’ve ever bothered to multiply a percentage by 100)

  2. Jonathan (another one) says:

    A lot of the graphs in the paper are like that. I took it to be some sort of hyperbolic discounting. Irrational, but accurate!

  3. Elrod says:

    Typo: “The real point is that, once the y-axis is scaled correctly”

    If one wanted to use logarithms for intuitive purposes (à la Weber-Fechner’s law), what do you think of taking a logarithm of both axis?

  4. Mark says:

    I’m much more concerned with the figure caption than with scaling of the X axis. First, “probability” has very little to do with whether any particular marriage breaks up (well, except for maybe the marriage of a statistician who is infatuated with the concept of probability)…. the Y axis is correctly labeled, this is simply the “percent disrupted”. Second, the X axis is still mis-labeled, this is not at all by “duration of marriage” (e.g., those who have been married for 10 years are not necessarily more likely to split than those who have been married for 5 years… although see the above caveat regarding statisticians again…). The X axis should properly be something like “Time since marriage”, because Kaplan-Meier plots plot the cumulative incidence of the event.

    • Rahul says:

      Actually, the yearly probabilities would be interesting to see. Is there a peak? Are marriages most likely to break in (say) the 2nd year?

      Would that cure be calculable simply as the derivative of this curve?

      • David P says:

        Back when I was studying these things (mid-1980s) the peak divorce rate was in the third year of marriage, and the cohort peak for that peak was the rate of divorce for the 1976 marriage cohort in 1979. (I only remember this because I got married in 1976, although that marriage has survived.)

        • Rahul says:

          Thanks! I was eyeballing the correctly scaled figure Zachary made and the slope of the non-Hispanic Black line seems to go from low to high to low around the same 3-5 year mark.

          So, yes, what you say probably matches. The breakup peak probability seems bracketed in the 3-5 year mark. At least for blacks.

  5. One interesting question arises from all of these axis errors: what plotting software were they using that didn’t automatically identify the x-axis as a numerical value? The resulting graphs are as if the software treated the entries as categorical variables.

  6. Chris G says:

    So if I start with the presumption dm/dt = -k(m-m_inf) where m(t) = Pr(married) and m_inf is the asymptotic value of Pr(married) then the cumulative probability of “disruption”, i.e., ~married, is (1-m_inf)*(1-exp(-kt)) where k is the probability of disruption per year. Eyeballing the data, it looks like that function could fit the data reasonably well.

Leave a Reply