Skip to content

The real dogs and the stat-dogs


One of the earliest examples of simulation-based model checking in statistics comes from the 1954 book by Robert Bush and Frederick Mosteller, Stochastic Models for Learning.

They fit a probability model to data on dogs being shocked in a research lab (yeah, I know, not an experiment that would be done today). Then they simulate “stat-dogs” from their fitted model and compare them to the real dog.

It’s a posterior predictive check! OK, only approximately, as they use a point estimate of the parameters, not the full posterior distribution. But their model has only 2 parameters and they’re estimated pretty well, so that’s no big deal. Also, they only simulate a single random replicated dataset, but that works ok here because the data have internal replication (an independent series on each of 30 dogs).

The responses of real dogs, that is, the observed data, y, are shown above.

Here are the “stat-dogs,” the replicated data, y_rep:


And here are some comparisons:



Jennifer and I pick up this example in chapter 24 of our book and consider how to make the graphical comparison more effective. We also talk about some other models and about how that the variation among dogs can be explained by some combination of variation between dogs and positive feedback learning (the idea that a dog learns more from an avoidance than from a shock). With such a small dataset it’s hard to untangle these two explanations.


  1. My comment is not directly relevant to the overall point of the post, but I wanted to address this statement: “They fit a probability model to data on dogs being shocked in a research lab (yeah, I know, not an experiment that would be done today).” I don’t know whether dogs are still being shocked in learning experiments, but it’s certainly true that all sorts of cruel experiments are still being done to <a href=";?dogs and other sentient creatures.

    • David Condon says:

      Those are all examples from medical studies, which have separate review processes. It’s pretty much impossible to do shock treatment studies of animals in psychology today.

  2. I ordered “Data Analysis Using Regression and Multilevel/Hierarchical Models” and look forward to chapter 24 (and the rest)! I intend to read it in sequence and in full, so it might take me a while to reach the dogs.

  3. Anoneuoid says:

    I asked earlier what model was used, but it never showed up. Here is what I got after a bit of manually tuning Thurstone’s[1]:
    Mean SD
    TrialsBeforeFirstAvoidance 4.60 2.44
    TrialsBeforeSecondAvoidance 6.50 2.05
    TotalShocks 6.00 1.11
    TrialofLastShock 10.70 4.37
    Alternations 4.13 2.30

    [1] Thurstone, L.L. The learning function. J. Gen. Psychol., 1930, 3, 469-493. code:

    • Anoneuoid says:

      Ok, it appears to be this model:
      A mathematical model for simple learning. Bush, Robert R.; Mosteller, Frederick. Psychological Review, Vol 58(5), Sep 1951, 313-323.

      They derive this model by saying learning involves changing the probability an event (eg shock) occurs (p) as some function of the current probability. Then they say that (whatever this function is) we can expand the power series and take the first two terms to get p[t+1] = a0 + a1*p[t], where a0/a1 are constants. Then they say that p should change proportional to the maximum possible amount of change, so they define a0 = a and a1 = 1-a-b. This gives p[t+1] = p[t] + a*(1-p[t]) – b*p. So the second term is the amount of “positive” learning that can occur, while the third is the amount of “negative” learning. The a and b constants are the positive and negative learning rates.

      I don’t think you can get a sigmoid learning curve with this, and the monotonic curve it is capable of producing may arise as an averaging artifact (this is also pointed out by Thurstone in that 1930 paper):

      “The negatively accelerated, gradually increasing learning curve is an artifact of group averaging in several commonly used basic learning paradigms (pigeon autoshaping, delay- and trace-eye-blink conditioning in the rabbit and rat, autoshaped hopper entry in the rat, plus maze performance in the rat, and water maze performance in the mouse).”

      I think Thurstone’s idea of conceptualizing learning as urn problem is both easier to relate to mechanism and can produce a more flexible set of curves while using the same number of parameters.

Leave a Reply