Skip to content

A stunned Dyson

Terry Martin writes:

I ran into this quote and thought you might enjoy it. It’s from p. 273 of Segre’s new biography of Fermi, The Pope of Physics:

When Dyson met with him in 1953, Fermi welcomed him politely, but he quickly put aside the graphs he was being shown indicating agreement between theory and experiment. His verdict, as Dyson remembered, was “There are two ways of doing calculations in theoretical physics. One way, and this is the way I prefer, is to have a clear physical picture of the process you are calculating. The other way is to have a precise and self-consistent mathematical formalism. You have neither.” When a stunned Dyson tried to counter by emphasizing the agreement between experiment and the calculations, Fermi asked him how many free parameters he had used to obtain the fit. Smiling after being told “Four,” Fermi remarked, “I remember my old friend Johnny von Neumann used to say, with four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” There was little to add.

My reply: I’d have love to have met Fermi or Ulam. Something about Neumann really irritates me, though. That elephant quote just seems like bragging! For one thing, I can have a model with a lot more than five parameters and still struggle to fit my data.


  1. Jag Bhalla says:

    Perhaps the focus shouldn’t be on the free parameters, but on the relationships they try to configure.
    As Steve Pinker says “No sane thinker would try to explain World War I in the language of physics.”
    Can “tools honed in the “olicausal sciences” (oli = few), handle… greater complexities…”
    Perhaps many of life’s patterns remain beyond “the numbers.”

  2. Eric says:

    Apparently somebody in 2009 managed to fit an elephant with 4 parameters and wiggle it’s trunk with 5.


  3. Jonathan (another one) says:

    Apocryphal, I presume, but you never know with Von Neumann.

    The great mathematician John Von Neumann was consulted by a group who was building a rocket ship to send into outer space. When he saw the incomplete structure, he asked, “Where did you get the plans for this ship?”
    He was told, “We have our own staff of engineers.”
    He disdainfully replied: “Engineers! Why, I have complete sewn up the whole mathematical theory of rocketry. See my paper of 1952.”

    Well, the group consulted the 1952 paper, completely scrapped their 10 million dollar structure, and rebuilt the rocket exactly according to Von Neumann’s plans. The minute they launched it, the entire structure blew up. They angrily called Von Neumann back and said: “We followed your instructions to the letter. Yet when we started it, it blew up! Why?”
    Von Neumann replied, “Ah, yes; that is technically known as the blow-up problem – I treated that in my paper of 1954.”

  4. Martha (Smith) says:

    I never thought of the elephant quote as bragging — rather, as humorous hyperbole that makes a good point.

    • Richard McElreath says:

      Agree. Useful as a memorable nudge against overfitting.

      But I also sympathize with Andrew’s point about struggling to fit models. There is a naive perspective that adding parameters somehow makes it easy to fit a sample. But when those parameters are structured in groups to achieve proper pooling, it ain’t necessarily so.

      Indeed, I find often that when I add parameters (sometimes 100s of parameters), I then end up with fewer “effective” parameters as a result of pooling and get a poorer fit to sample out of it. That’s all as it should be, but it isn’t captured by the elephant notion.

      I think about this a lot lately, as I’m working on a Stan project with literally 20-thousand parameters. The parameters arise from the hierarchical structure of the data (it’s a big sample), and do nothing but pool and shrink estimates. These parameters don’t give me freedom at all. They make it harder to overfit my sample!

      • Me too on this one. It took me months to figure out how to structure a spatial model of certain economic issues in the American Community Survey. There was a smooth function of space that had 10 or 15 parameters which acted as a prior for several parameters per public use microdata area, and then each public use microdata area got its own function of income from those parameters, and then 5 or 10 households per PUMA (sub-sample to keep the computation do-able) had to fit this function, and the errors are expected to be right skewed, so those error shape parameters come into the whole thing… To get the whole thing to be identifiable required careful thought about how to organize the parameterization. The end result is it works, but after all that work, I took a break from the project because it was a little overwhelming. This was all just the first stage in then predicting from this information some frequencies of various behaviors… so I’m looking forward to having the energy to go back to it… but with all this it was literally thousands of parameters by count (there are 2000+ microdata areas in the US) but, and this is the key, it was also in some sense thousands of dimensions. In the end effectively with the spatial pooling, the individual parameters can’t move independently, so it’s effectively something between the 10-15 parameters of the smooth spatial function and the 2000+ parameters of the individual microdata areas, pooled together through the smooth function, probably effectively a few hundred dimensions of parameter space fitting thousands of dimensions of data space. This is exactly why Stan is required. The manifold that Stan has to fly around in in parameter space is tightly constrained, all the parameters have to move together because of the structure of the whole thing. Thinking about how that works in order to build a better model is really interesting. But it’s far beyond what Von Neumann was thinking of.

        • jrc says:

          A smoothed set of spatial and temporal controls could be a really interesting tool for the social science researchers who like to pool multiple ACS/CPS rounds and look at changes in the world across space and time. I’d be pretty curious about results from a model like that compared to the standard spatio-temporal fixed-effects models common in the applied microeconomics literature (think indicator variables for time and geography). If you had it all set up, and put whatever (wages?) on the left, and randomly assigned placebo-treatments on the right (at whatever spatio-temporal level according to whatever rules), you could compare the variability of the two estimators at different geographic levels of treatment-assignment using the real underlying data.

          The mean-differencing/fixed-effects models used in applied micro have a very nice interpretive framework for thinking about the relationship between Theta and the world (to bring it back to another recent topic), generally of a “difference-in-difference” relative-changes type. But they generate really noisy estimates. And I’ve never seen a good test of the precision gains from smoothing like you suggest in this kind of context. And if you could bound any potential bias relative to the fixed-effects estimates (in some manner), you might change the way people estimate individual-level responses to aggregate/local changes in the world (policy, macroeconomic conditions, exposures of various kinds).

      • Martha (Smith) says:

        Richard said: “There is a naive perspective that adding parameters somehow makes it easy to fit a sample. But when those parameters are structured in groups to achieve proper pooling, it ain’t necessarily so.”

        But remember that when von Neumann stated his elephant quip, there was no handy software to do things like structuring parameters in groups to assure proper pooling. Mechanical (not electronic) calculators could help with the numerical calculations, but computers were very primitive compared to what is available today (vacuum tubes, tape memory, assembly language, punch cards, etc.)

      • Keith O'Rourke says:

        I think astronomers were a bit more explicit about parameters being common [pooling] (and in what sense[partial pooling]) starting in the early 1800,s.

        Some pointers to that here –

        • Martha (Smith) says:

          This does not surprise me — my understanding is that astronomers may have been the first to consider random as well as fixed factors (e.g., weather conditions needed to be considered as random factors).

    • Rahul says:


      And the number of times I’ve seen academics claim they’ve discovered an “effect” just because they got a great fit. And when you start digging deeper into their models there’s a gazillion ways they could have tuned the model to fit pretty much any data.

  5. Sam says:

    My understanding is that von Neumann’s remark is on fitting probability distributions rather than statistical models.

  6. Dave says:

    “Although the fit of the observed data to the theoretical curves was excellent, it would be a strange curve indeed that could not be fit with four parameters!” (Richard Lewontin, The Genetic Basis of Evolutionary Change, 1974, p. 234) I’ve always enjoyed this quote for the exclamation mark which is rare in scientific texts. This is in a discussion of early attempts to ascertain whether selection is acting in a population on a specific trait. It’s followed by a discussion that is somewhat reminiscent of some of Andrew’s concerns concerning the shift from main effects to interactions when a hypothesis fails to fit the data. Perhaps methodological problems are especially pressing when good measurements aren’t available to test theories something that was definitely true of evolutionary genetics in the relevant period.

  7. JK says:

    From Ulam’s Adventures of a Mathematician, a story told to Ulam by Ed Condon:

    ‘The lecturer produced a slide with many experimental points and, although they were badly scattered, he showed how they lay on a curve. According to Condon, von Neumann murmured, “At least they lie on a plane.”‘ (p104)

    • Martha (Smith) says:

      This brings to mind a point that is often overlooked, so I usually tried to point out when teaching regression: When considering three variables, they may be linearly related — but if you drop one variable, there is no reason to assume that the remaining two are linearly related. (Practical consequence: E(z|x,y) = ax + by doesn’t imply E(z|x) = cx. For example, suppose z = x + y where y = x^2.)

      (Of course, if the variables are joint multivariate normal, there is no problem. So it’s a good example of how model assumptions can come into play in practical work.)

  8. Oliver says:

    One professor on my dissertation committee as a young faculty member or postdoc worked with Ulam . According to him, Ulam was a great guy fully capable of providing exact answers to the right questions. Also, the professor thought that of the Manhattan Project mathematicians, Ulam did all the work, Teller spent all his time developing plans to blow up the world and Von Neumann farted around.

  9. Stephen says:

    Aphorisms are such fun ! But they are seldom reliable indicators of the person concerned, because the quote is memorable because it is, well, memorable and out of the ordinary. An outlier observation, if you like… To adoring fans of the writer of the aphorism, it’s confirmation of the genius of the person concerned. To detractors, it’s confirmation that the person was arrogant, ignorant, or both. Historians have long since ceased to rely on them for serious assessments of a person’s deep intentions and actions. It doesn’t help that most reported quotes are misinterpreted, exaggerated, or just downright wrong. But as I said – fun.

    Von Neumann, Teller, Ulam, and so many others were a product of their time, which is a long long time ago. In a modern world they might not survive the way we work. Or maybe they’d thrive. Who can tell? To many (myself included) they were all unreasonable men. It’s not a pejorative – here’s another aphorism (from another unreasonable man):

    “The reasonable man adapts himself to the world:
    the unreasonable one persists in trying to adapt the world to himself.
    Therefore all progress depends on the unreasonable man”.

    — George Bernard Shaw, “Maxims for Revolutionists” (1903).

    • Though I like Neil deGrasse Tyson, this sentence is silly. (“In science, when human behavior enters the equation, things go nonlinear. That’s why Physics is easy and Sociology is hard.”) First, modern-day physics spends a lot of effort looking at non-linear dynamics, and often does a pretty good job.

      The second reason is more important, and relates to several comments in this thread that I sadly don’t have time to respond to at length. In brief: Areas of science that have been successful (like Physics) aren’t successful just because they’re simple, but because an inherent part of their approach is understanding and characterizing the noise in the things they study, and only making claims when they’re warranted. An example: Gravitational waves have been around forever. We’ve known they should exist for a century. The people interested in them could have, let’s say back in 1990, used the technology of the day, done some p-hacking, noise mining, etc., and announced the detection of gravitational waves with the same “rigor” as we’re seeing in himmicanes, etc. They could have addressed criticisms by saying: “it’s really hard — these are deflections of 10^-20 m we’re trying to measure!” But, thankfully, they didn’t — they kept working on measurement and analysis techniques (which are really impressive, by the way) until it was actually justifiable to make a statement about gravitational waves. (Making weaker statements about limits on detection, progress towards sensitivity goals, etc., is also perfectly fine, and is thankfully acceptable in some fields of physics.) Just because your system noisy, non-linear, whatever doesn’t justify making unwarranted claims about it.

      Yes, science is hard. As someone who runs a lab, I can attest that it’s physically and mentally exhausting (though sometimes thrilling). I can understand why people feel that having spent years making measurements they “should” be able to draw a strong conclusion from them. The emotional challenge of this is a topic for another day… For now I’ll just reiterate that the problems of these areas of Psychology, etc., are not due to their dissimilarity to Physics in subject matter.

      • Andrew says:


        Good answer, and also another example of how blog is better than twitter. Your thoughtful comment is something Tyson will never see. But if he had a blog, then he and his readers would see it and learn something.

      • Martha (Smith) says:

        Raghu said, “Though I like Neil deGrasse Tyson, this sentence is silly. (“In science, when human behavior enters the equation, things go nonlinear. That’s why Physics is easy and Sociology is hard.”) First, modern-day physics spends a lot of effort looking at non-linear dynamics, and often does a pretty good job.”

        My experience is that some physicists use the phrase “goes nonlinear” in a figurative way to say something “gets a lot more complicated”; that’s how I interpreted Tyson’s comment (rather than as referring to the literal technical meaning of “nonlinear”)

Leave a Reply