(from a 1922 review of Keynes’ Treatise on Probability)

“To the statistician probability appears simply as the ratio which a part bears to the whole of a (usually infinite) population of possibilities. Mr. Keynes adopts a psychological definition. It measures the degree of rational belief to which a proposition is entitled in the light of given evidence.”

(from the discussion on a 1935 paper)

“following Bayes, and, I believe, most of the early writers, but unlike Laplace, and others influenced by him in the nineteenth century, I mean by mathematical probability only that objective quality of the individual which corresponds to frequency in the population, of which the individual is spoken of as a typical member.”

The later is available here: https://www.scribd.com/document/329753413/The-Logic-of-Inductive-Inference

Another quote from it:

“Although some uncertain inferences can be rigourously expressed in terms of mathematical probability, it does not follow that mathematical probability is an adequate concept for the rigorous expression of uncertain inferences of every kind. (..) If it appears in inductive reasoning, as it has appeared in some cases, we shall wellcome it as a familiar friend. More generally, however, a mathematical quantity of a different kind, which I have termed mathematical likelihood, appears to take its place as a measure of rational belief when we are reasoning from the sample to the population.”

]]>“I’d love to discuss more if you have quotes/references that disagree.” Thanks Anon! I may collect some more info but I don’t expect to become worthy of an expert’s extended time on the topic of Fisher! – though I will take up your offer if I can get away with it!

Ok, but all I was wondering is what made you say this:

Fisher [did] as much as anyone else to block Bayes.

You shouldn’t need any additional info collection. Is it just an impression you got from somewhere or what?

]]>I’ve a feeling Sharon B M said in her Youtube talk that Fisher had had discussions with Turing – who used Bayes in his decryption work.

]]>Also, his dislike of priors seems very old fashioned to those of us brought up with iterative techniques. Things were different in the pre-computing days; today I for one tend to see “priors”, to the extent that I understand them which isn’t far, as just being the initial values you plop in because you’ve got to start somewhere. Lucky initialisation might speed things up but their details won’t matter much in the end. The deep influence of computers can easily be underestimated. Just two good old boys from my U/G days are a good example. J Maynard Smith said “any idiot can wite a computer program” – and he meant it!, and dear Margaret Boden is I think uninfluenced by some crucial subtleties of experience that writing a rattling good cognition program or two could lend. (Not certain about the last one, but strong suspicions. Hope she doesn’t read this!)

“I’d love to discuss more if you have quotes/references that disagree.” Thanks Anon! I may collect some more info but I don’t expect to become worthy of an expert’s extended time on the topic of Fisher! – though I will take up your offer if I can get away with it! Mind you, I never expected to get where I’ve got in the last few months. It was only last month that I discovered Sewell / Wright were two names of the same person, not two people! Even if I can pose as a genetics expert it would be years before I could get to grips properly with stats, and these days I ought to be planning and assigning my remaining years carefully :-S

]]>Yes, yes, indeed, indeed, thanks, it was Metropolis. I realised I’d got that wrong a few hours after writing it! It couldn’t be from the later Hastings work because “Monte Carlo” was a deliberately uninformative code word, in the style of the early cold war and presumably influenced in the wake of WWII. By 1970 they wouldn’t have bothered. In fact originally I’d thought Metropolis and Hastings were code words directly influenced by “Manhatten” and “Monte Carlo” :-) . ]]>

Yes and no.

If you can walk someone through Galton’s two stage quincunx it will likely be true that it’s very clear to them what is happening.

But that does not mean that they will or even can connect the underlying logic (how to profitably learn from observations) to anything else – especially pragmatic Bayesian analyses with modeling justifications, criticism, predictive checking and adequate processing of posterior quantities. A point first made to me by David Spiegelhalter.

For instance, just providing a sense of a reality beyond direct access (always the case in statistics instead of math) that has to be represented (as a model to do any analysis) creates a lot of confusion – https://galtonbayesianmachine.shinyapps.io/GaltonBayesianMachine/

I don’t think anyone has fully worked through a full undergrad non-math course starting with just Bayes, sticking with it and introducing sampling distributions as simply an aspect of prior/posterior predictive quantities. Recently read what Tim Hesterberg (The Amer Stat 2015) argues needs to be covered to safely teach the bootstrap in undergrad and I suspect it won’t be as hard for many.

]]>Then I ask (cheating of course) “Suppose the test is positive? What’s the probability that the subject has breast cancer?”

]]>One of the innovative features of Gigerenzer’s book is his use of what he calls “natural frequencies” to explain the basic ideas of Bayesian inference without using Bayes’ theorem, in a way that appeals to lay people and which can easily be explained to people even without writing anything down, in simple cases. See here:

https://en.wikipedia.org/wiki/Gerd_Gigerenzer#Risk_communication

I have often run into people who hated their experience in a statistics class (often on airplanes where you have a seat-companion who asks you what you do, and when they learn you are a statistician, say how much they hated the course they took). I have used that opportunity many times to give simple examples, just in words, of how different the Bayesian approach is. For example, I can say (following an example in Gigerenzer’s book, so these numbers may have changed) that a mammogram, in the usual population tested, is 90% accurate: if the subject has breast cancer, the test will be positive 90% of the time, and if not, it will be negative 90% of the time. Then I ask (cheating of course) “What’s the probability that the subject has breast cancer?” “90%” is the usual reply (which, according to Gigerenzer, is what a large fraction of oncologists surveyed also answer). But then I note the prior: In that population, only 1% have cancer. So, out of a group of 1000 women (this is using Gigerenzer’s “natural frequencies” method) 10 will have cancer, of which 9 will be positive. Of the remaining 990 women who do not have cancer, 99 will be positive (false positive). So 9/(9+99) or about 8% actually have the disease. No mention of Bayes’ theorem. No writing stuff down, and it’s intuitively clear what’s happening. Most of the people I’ve had this discussion with are quite interested.

The book does not discuss loss functions, which I covered in my course using other reading materials. He may have discussed this in his more recent (2014) popular book, “Risk Savvy: How to Make Good Decisions”, but I haven’t read it.

]]>The attempt made by Bayes, upon which the determination of inverse probabilities rests, admittedly depended upon an arbitrary assumption ,so that the whole method has been widely discredited.

(1921). “On the `Probable Error’ of a Coefficient of Correlation Deduced from a Small Sample.” Metron, 1: 3-32.

Such inferences are usually distinguished

under the heading of Inverse Probability,

and have at times gained wide acceptance. This is

not the place to enter into the subtleties of a prolonged

controversy; it will be sufficient in this general

outline of the scope of Statistical Science to reaffirm

my personal conviction, which I have sustained elsewhere,

that the theory of inverse probability is founded

upon an error, and must be wholly rejected. Inferences

respecting populations, from which known

samples have been drawn, cannot by this method be

expressed in terms of probability, except in the trivial

case when the population is itself a sample of a superpopulation

the specification of which is known with

accuracy.

(1925a). Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd, 1 edition.

]]>Abstract. Ronald Fisher believed that “The theory of inverse probability is founded upon an error, and must be wholly rejected.” This note describes how Fisher divided responsibility for the error between Bayes and Laplace. Bayes he admired for formulating the problem, producing a solution and then withholding it; Laplace he blamed for promulgating the theory and for distorting the concept of probability to accommodate the theory. At the end of his life Fisher added a refinement: in the Essay Bayes had anticipated one of Fisher’s own fiducial arguments.

]]>Interesting, I think that might apply to CS Peirce, but I have not done the scholarship necessary to make such a claim.

Fisher like almost anyone else, would use Bayes if they had what Don Fraser has coined as genuine prior knowledge – where the variation in parameters is aleatory rather than epistemic. Having a physical basis for the variation (cancer baseline rates).

For instance, Andrew and I quote Reid and Cox – “It seems to be already widely supported for probability generating models for data “[providing an] explicit description in idealized form of the physical, biological, . . . data generating process,” that is essentially “to hypothesize a data generating mechanism that produces observations as if from some physical probabilistic mechanisms”

While in contrast we claimed that “We believe scientific research would be more effective if statistics was viewed instead as primarily about conjecturing, assessing, and adopting idealized representations of reality, predominantly using probability generating models for both parameters and data …”

Now, David later by email disagreed strongly with value of ever using a probability model for for quantifying their [epistemic] uncertainty “is often (nearly always?) a bad idea”. I do believe that he is summarizing Fisher’s view here.

]]>Modernity starts with the great anti-Bayesian Ronald Fisher

But you don’t provide any source for this. Do you have any quotes in mind from Fisher that make you think he is actually “anti-Bayesian” vs some distinction like “anti-uninformed priors”, etc?

]]>Fisher [did] as much as anyone else to block Bayes.

It’s been awhile since I read him closely but Fisher never really seemed anti-Bayesian to me. I remembered this quote, maybe he changed his mind when he got older?

Now suppose there were knowledge a priori of the distribution of mu. Then the method of Bayes would give a probability statement, probably a different one. This would supersede the fiducual value, for a very simple reason. If there were knowledge a priori, the fiducial method of reasoning would be clearly erroneous because it would have ignored some of the data.

Fisher, R N (1958). “The Nature of Probability”. Centennial Review. 2: 261–274. http://www.york.ac.uk/depts/maths/histstat/fisher272.pdf

IIRC, he was much more anti-Neyman-Pearson (power, alternative hypothesis, accepting hypotheses, etc) than anti-Bayes. Its more the NHST-advocates who are defending the indefensible and pretty much know nothing of how their methods were developed that are anti-Bayes.

Fisher does not seem to have been an NHST-advocate or user, in fact (unlike Student and Neyman) I have always seen him be very careful not to draw conclusions beyond the precise null hypothesis being tested. In fact, once again IIRC, the smoking issue you bring up was precisely him pointing out alternative reasons for rejecting the null hypothesis that cancer rates were equal amongst smokers and non-smokers. Also, I’d be careful with allowing any medical claim at all to have the status as “fact” in your worldview…

I’d love to discuss more if you have quotes/references that disagree. This is mostly from memory of back when I first discovered the true nature of NHST.

]]>Regarding unreasonable and ignorant resistance to Bayesian methods, see this article with Christian Robert and our rejoinder to discussion.

P.S. According to Wikipedia, Nicholas Metropolis came up with the term “Monte Carlo method” in the 1940s.

]]>“Knowledge and its Limits”, by Timothy Williamson

and

“Causality”, by Judea Pearl

]]>“The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from…” https://www.amazon.co.uk/Theory-That-Would-Not-Die/dp/0300188226

…but I’ve seen her Youtube video:

https://www.youtube.com/watch?v=2o-_BGqYM5U

I didn’t realise it wasn’t just ignorance but stubbornness that blocked Bayes for so long. Interestingly it is mirrored by and overlaps with the antagonism by biological geneticists to investigation by computer simulation. I shall be writing about the big exception: MCMC, but I didn’t realise even this was given such a hard time. I can only guess how much damage the persecution of Bayesianists has done to science/stats/civilisation in general, but I’ve discovered for myself that just anti-computer-simulation has crippled our understanding of the Darwinian nechanism by an unnecessary 40 years.

Even from her video we can see that one of the great heroes of computationa science, Keith Hastings (who I believe invented the name Monte Carlo) pretty well had his career terminated for his efforts.

This putting of enormous effort into opposing the best discoveries seems unfortunately to go beyond “recurring theme” status; it seems to be one of the main activities of scientists. We are now familiar with Fisher’s claims that smoking does not cause cancer, but he also claimed cancer causes smoking, while doing as much as anyone else to block Bayes. Haldane’s 1957 paper, currently adhered to in pure form by huge numbers of geneticists, has done even more to block our understanding of genetics than Fisher did to block stats. It’s important that everyone entering science and maths is told the truth: that the bigest problem in science is scientists.

]]>Beyond that, there are other books on the list that I would also agree to, but I won’t agree in detail.

Books not on the list, a bit further from statistics, but relevant for building complex mathematical software

Computational math

Strang – Computational Science

Programming (David MacKay had many strengths – programming wasn’t among them that I could see).

Gries – Science of programming

Structure and interpretation (of course)

Lots of other things, but certainly MacKay, SICP, Gries and Strang are books I would run into a burning house to save

]]>My priority has been to understand what these books suggest. And if these resources haven’t yet settled some of the intellectual challenges we face, then we need to discuss their strengths and drawbacks. Much writing is so poor that it becomes laborious to distill the argument without its. excruciating mapping. ]]>

I think people, and also I, read less books now when it’s easier to find articles, case studies, and blog posts in internet.

I think the same. I’ve seen too much wrong, out-of-date, or not-even-wrong stuff stated with authority in textbooks. I say that as someone who actually did read them cover to cover for classes at one point.

The most recent thing I remember coming across was a neuroscience book which claimed something like “there are x neurons in the human brain and y connections between them” without offering any citation. How was it measured, do other methods of estimation give similar values, etc? All the actual scientific content was missing, replaced by authoritative claims I was supposed to memorize. At some point I just stopped bothering with textbooks altogether due to anti-scientific stuff like that. And of course, all here know about how NHST was created purely by stats 101 textbook authors who got confused between N-P and Fisher.

]]>The books that have had an influence on me personally are more commonly used in the signal processing/communication theory higher education. Some of them are not in the list:

– Popoulis, “Probability, Random Variables and Stochastic Processes,” : Provides an intuitive yet compactly rigorous introduction to the theory of probability, statistical inference and (discrete time) stochastic processes. When at graduate school, I remember going through the chapters of this book several times.

– Cover & Thomas, “Elements of Information Theory”: Again a graduate textbook which has a wide coverage of information theory and related topics.

One book which shaped my research vision later on was

– R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks

and Expert Systems. Springer, 1999 : A very good book on probabilistic graphical models and inference algorithms, particularly for developing a fundamental understanding of Markovian relations on general graphs.