Flame bait

Mark Palko asks what I think of this article by Francisco Louca, who writes about “‘hybridization’, a synthesis between Fisherian and Neyman-Pearsonian precepts, defined as a number of practical proceedings for statistical testing and inference that were developed notwithstanding the original authors, as an eventual convergence between what they considered to be radically irreconcilable.”

To me, the statistical ideas in this paper are too old-fashioned. The issue is not that the Neyman-Pearson and Fisher approaches are “irreconcilable” but rather that neither does the job in the sort of hard problems that face statistical science today. I’m thinking of technically difficult models such as hierarchical Gaussian processes and also challenges that arise with small sample size and multiple testing. Neyman, Pearson, and Fisher all were brilliant, and they all developed statistical methods that remain useful today, but I think their foundations are out of date. Yes, we currently use many of Fisher’s, Neyman’s, and Pearson’s ideas, but I don’t think either of their philosophies, or any convex mixture of the two, will really work anymore, as general frameworks for inference. Ioannidis, Bem, Simonsohn, Kanazawa, etc. Not to mention hierarchical models.

32 thoughts on “Flame bait

  1. The latest technical developments scarcely indicate moving away from the underlying statistical rationale of sampling theory, but rather the opposite; and the need for a general philosophical perspective both to direct and critique uses and abuses of statistical method (old and new) is greater than ever. Birnbaum’s remark, about the “rock in a shifting scene”, recently quoted on my blog, comes to mind:

    “If there has been ‘one rock in a shifting scene’ or general statistical thinking and practice in recent decades, it has not been the likelihood concept, as Edwards suggests, but rather the concept by which confidence limits and hypothesis tests are usually interpreted, which we may call the confidence concept of statistical evidence.” uhttp://errorstatistics.com/2013/05/27/a-birnbaum-statistical-methods-in-scientific-inference/

    I think it still holds true. The connections of your own work to error statistical notions further bears this out (e.g., Gelman and Shalizi).

    • I think you’re half right. That quote from Birnbaum dates to 1970, so “recent decades” he’s talking about are the 50’s and 60’s. There’s been some water under the bridge since then. The most sophisticated fields today include data mining, pattern recognition, machine learning, multilevel/hierarchical Bayesian weather models, image reconstruction, natural language translation, Nuclear Magnetic Resonance Imaging, Radar target discrimination, and so on, which barely existed back then and basically never even mention sampling theory, classical hypothesis testing, or confidence intervals.

      I think that was Gelman’s point about not being general enough for todays questions.

      The latest developments you’re talking about are probably from fields like pharmacology, psychology, and economics. I think we can all agree the sampling theory viewpoint continues to do wonders for those areas. As evidenced we need look no further than the constant cries from researchers working in less fortunate fields, “If only we could be more like Pharmacology, Psychology, or Economics!”

  2. You call that flame bait? Check out that Edmund Burke quote on the second page of the paper:

    “But the age of chivalry is gone. That of sophisters, economists, and calculators, has succeeded; and the glory of Europe is extinguished for ever.” -Edmund Burke, 1790

    That’s how “flame bait” is done.

  3. “Neyman, Pearson, and Fisher all were brilliant, and they all developed statistical methods that remain useful today, but I think their foundations are out of date”

    What foundations? ;-)

  4. Phayes: Exactly, and that is why providing a foundation from the perspective of today’s issues is brand new and scarcely even begun. Having thrown off the yoke of some of the ancient ideas as to what a general statistical philosophy needs to look like, we may discover why older methods work, when they do, and arrive at something beyond a hodge-podge of technical tricks.

    • Mayo: Yes – I think we agreed about the importance of thinking about foundations a while back (on your website), but not so much on what to think. :)

  5. There is no hope to understand the hard problems properly as long as the not-so-hard ones are not understood. So as long as there is trouble with the foundations regarding the “simpler” problems that Fisher/Neyman-Pearson or Ramsey/de Finetti/Jaynes addressed, this trouble deserves some attention. Problems won’t go away by saying “these days we deal with more complex stuff so the old stuff is irrelevant”.

    • I couldn’t agree more. Those two approaches make some very different implications for the even the simplest of problems. If you flip a coin N times and count the percentages f of heads then there are two versions:

      Version 1: Each of the 2^N possible sequences is equally likely. We know this somehow even though N doesn’t have to be very large before there’s more sequences than atoms in the universe. The goal of assigning a probability distribution is to model this mysterious physical property of the universe which we call randomness. Our confidence intervals which suggest f~.5 come with guarantees because of that.

      Version 2: Each data point is a kind of projection on state of the universe. Stylistically we could say data_i=F[state(t_i)] where the universe evolves state(t_i) -> state(t_i+1) any old way it feels like. We only know a limited amount about that evolution in truth, but we do know almost every 2^N sequence has the property that f~.5. The goal of assigning a probability distribution is to take a kind of “majority vote”. If we had to guess what f is, we can’t do better (without knowing more) than to go with majority and say f~.5. There are no guarantees this will happen, but in practice almost no matter how state(t_i) -> state(t_i+1) it will lead to one of those “majority sequences” in which f~.5, so the guess is rather robust and tends to be what we actually observe. Or to phrase it in a way relevant to Physicists: if we do observe f~.5 it tells us almost nothing about state(t_i) -> state(t_i+1).

      These really are two different views of what’s happening, which lead to quite different implications and predictions even though they both make the same narrow prediction f~.5. I predict the future of statistics lies entirely with second version and it’ll be difficult to make more than epsilon progress in statistics until the version 1 is dropped completely.

        • No. Smart people can take a bad idea, combine it with their brilliant intuition and do quite a bit. But their brilliant intuition can only get them so far and I think we’ve already reached the limit. From here on out, anyone wedded to version is limited to making epsilon tweaks to what we have currently. At some point there’s consequences for getting the model wrong.

        • It depends on the field. In non-equilibrium statistical mechanics, I’d say about 80 years ago. In Finance, I’d say about 30 years ago. Let me ask you a couple of questions.

          Name any field which uses frequentist/classical statistics heavily. How long has it been that field saw a fundamental increase in predictive ability? In Finance the answer would be about 30 years ago for example.

          For that matter, what do you think the last great advance in Frequentist statistics was?

        • “Name any field which uses frequentist/classical statistics heavily. How long has it been that field saw a fundamental increase in predictive ability? In Finance the answer would be about 30 years ago for example.”

          Frequentist recommender systems have improved significantly.

          “For that matter, what do you think the last great advance in Frequentist statistics was?”

          I’d have to say the bootstrap or regularization, but I might be suffering from a bit of hindsight bias.

        • Let me ask you something.

          Consider the example of 10 heads in 14 throws, where one is asked whether or not one will take a 1:1 bet for the next two throws coming up heads. Bayesian and frequentist inference yields two different probabilities of this event, and thus two different decisions.

          How would you test these two approaches against each other? My choice would be some form of large-n simulation study, where the “true p” is varied between 0 and 1 and 16 throws are produced. We discard all those, who do not display exactly ten heads in the first 14 throws. Would you agree that the proportion of HH in the last two throws can function as a validation of either inference? (I’m merely asking because I find it hard to figure if such a situation would function as a true validation within the Bayesian model of probability).

        • It would be better to make three general points:

          (1) Frequentists may have “an” inference, but Bayesian inferences will be conditional the assumed state of knowledge K. So there will be a different Bayesian inference for each P(heads | K).

          (2) The Frequentists goal is to get a precise estimate, which is why they hope their distributions are approximately frequency distributions. But often the state of knowledge K isn’t strong enough to make accurate estimates. There is no theorem in statistics capable of changing this. Attempts to get a precise estimate when the information/data doesn’t support such an estimate amount to assuming knowledge you don’t have (such as knowledge of limiting frequencies) and hoping that it’s accidently true.

          The Bayesian goal is much more modest, but actually achievable without inventing facts. Namely the goal isn’t to get a precise estimate, but rather an estimate which isn’t “misleading”. For example, if you estimate f=.5 when the true value is f=.4 then the estimate is a bad one. But if the Bayesian estimate was actually f=.5 +/- .2 then the Bayesian wasn’t being mislead by this estimate.

          So if you know nothing and model the coin flips with a uniform distribution, then a Frequentist will view this as wrong because the point estimates are liable to be off significantly. However, the Bayesian estimates will have large error bars and so the Bayesian wasn’t being “mislead” by the estimate. That’s not a great state of affairs, but you can’t improve on it without having true additional knowledge about the physical system (knowledge of the coin isn’t sufficient because the outcomes are strongly dependent on the initial conditions of the coin flip).

          (3) In general, given a state of knowledge K_i you can use this to get a distribution for the sequence of flips P(f_1, …, f_n | K_i). If K_i is true and you’ve translated this into a probability distribution correctly, then the true sequence of flips will be in the high probability region of P(). If you replace K_i by a more informative K_(i+1) then this high probability region will be smaller (i.e. shrink closer to the true sequence). In the limit of perfect physical knowledge needed to predict each flip, the distribution will shrink down to a delta function around the true sequence and you’ll be able to predict the sequence with perfect accuracy.

          So as the distributions become very accurate (K_i is true and highly informative) then the marginal distributions P(f_i | K) will get closer to either 0 or 1 depending on whether the true outcome of the ith flip was heads or tails.

          In other words: the most accurate distributions will necessarily use P(head) which differ considerably from the frequency of “heads”!

          A Frequentist is then left with two choices. Either never use these more accurate distributions because it’s no longer true that freq = prob. Or invent some kind of ensemble, like Multiple Universes or something, in which you can imagine these more accurate distributions are frequencies over the ensemble.

          If you’re a Frequentist and opt for the later strategy, then at least stop claiming your approach is objectively based on empirical reality. The reality is you’re just making Universes up based on nothing.

        • Again, the nesting limit is causing trouble. It might be interesting to mention here that Persi Diaconis actually built a coin flipping machine that consistently flips essentially exactly the outcome he predetermines. If the “state of knowledge” is that the flip is being made by this machine, then in fact you can predict the exact sequence.

          the preprint of his analysis of bias in natural coin flips, including pictures of the machine is here:

          http://comptop.stanford.edu/u/preprints/heads.pdf

        • “Again, the nesting limit is causing trouble”

          I agree. The Frequentist idea that prob=freq leads them to believe there is one correct distribution. So they have a hard time with states of knowledge K_1<…<K_n each of which is true and each more informative then the last.

          As a concrete realization, how about this:

          K_1: We've measured (or controlled) the initial conditions for the first flip but no the others.
          K_2: We've measured (or controlled) the initial conditions for the first two flips but not the others.
          .
          .
          .
          K_n: We've measured (or controlled) the initial conditions for all flips.

        • Let me understand this: You have a distinction between different views of randomness, such that if we jump for believing the wrong version, statistics won’t make progress. Please write this up as soon as possible so that it can be discussed properly.

          Alternatively, much of the generality of statistics stems from it not being based on any view of exactly what randomness means. Sometimes agonising about foundations makes as much sense as looking at a map through a microscope.

        • It’s not my distinction. It’s the distinction between von Mises/Fisher/Neyman-Pearson versus Laplace/Keynes/Jeffries/Jaynes. I think those luminaries have written plenty on it.

          As to the other part, I disagree completely. Much of the generality of statistics comes from the fact that sometimes you can get good answers by taking the “majority vote” mentioned above without having any more additional information or data. But to pull this off consistently you’d need to understand version 2 above, so foundations are important aplenty.

        • I will settle for one explicit reference from any of those authors explaining the distinction in terms resembling yours. For Jeffries read Jeffreys, by the way.

        • I’ll give you two:

          http://www.amazon.com/Probability-Theory-The-Logic-Science/dp/0521592712/ref=sr_1_1?ie=UTF8&qid=1370401593&sr=8-1&keywords=e.+t.+jaynes

          http://bayes.wustl.edu/etj/node1.html

          The “majority vote” language comes directly from Jaynes. The last comment “Or to phrase it in a way relevant to Physicists: if we do observe f~.5 it tells us almost nothing about state(t_i) -> state(t_i+1)” is the essence of the whole information theory stuff in statistics (the entropy of the f~.5 outcome is going to be very high indicating that you’ve learned little by observing it). It’s scattered all about Jaynes’s work as well as many others.

          Version 1 is an attempt to replace physics, which is probably why it didn’t sit well with physicists like Laplace, Jeffries, and Jaynes. It’s clear from version 2 why a result like f~.5 doesn’t supplant or in any way contradict known physical laws related to coin flips. You can find that idea scattered about Jaynes and probably many others as well.

          Anyway, those two links have everything Jayens wrote. So there you go.

  6. Being advised to read the whole of Jaynes was not quite what I had in mind as a single explicit reference. But you shouldn’t feel obliged to debate as I ask any more than I would feel obliged to debate as you might ask.

    You often post long comments on inference in this blog. I am just trying to work out if you are claiming to have novel insights, or paraphrasing what you regard as a standard view, and you’ve made that clear. Thanks!

    • It would take some time to go through and narrow it down to a shorter list, but there is absolutely no need to consider who said what. All one need do is find a place where the two versions disagree and then see for yourself which one is getting it right.

      Also, the information theory stuff mentioned didn’t originate historically with Bayesians. The first mention of similar ideas was by Fisher using Fisher’s Information (which is an approximation to Entropy). So that’s perhaps another reference:

      http://en.wikipedia.org/wiki/Fisher_information

Comments are closed.