I’m not really convinced by the idea that you set up a model, claim that this only models knowledge and not the data generating process, then you see some data from that process and claim, “oh, the knowledge I had before the data was actually different from how I initially set up my model” and change your prior. You may be right that formally one can show that under so-and-so conditions this means that you can only get things so-and-so wrong, but I’d expect that in practice one wouldn’t normally worry so much about these so-and-so conditions and whether they are fulfilled, and it’s hard to do that anyway if you licence yourself to change the prior model in all kinds of ways after seeing the data. Certainly it’s not transparent unless you specify everything in advance, otherwise it’s a fine Bayesian garden of forking paths, isn’t it?

]]>I’d still claim that your Friday model is stronger than your knowledge. You don’t actually know that your observations on Friday are symmetric, you only don’t know any reason why they shouldn’t be. If you only take 10 observations on Friday, it may not matter. If you take 10,000, a good data analyst may well spot a difference that you have committed yourself to ignore by using exchangeability, and not because any proper knowledge forces you to do that. ]]>

Point being 1/2, 1/2 and 1/2 instead of 1/3, 1/3 and 1/3 also satisfies everything except normalisation. But dropping this blocks key steps of Cox’s argument.

]]>Anyway, this feels pretty pointless – some people will never be convinced to consider alternatives once they’re committed to their axioms. I’ve tried many times to raise the simple possibility that the Jaynes-Cox argument is nowhere near as convincing of ‘the one true way’ as it seems to be made out. Maybe alternatives are also interesting?

So again, I’m not saying probability is not a useful tool I’m saying there are many reasons why Jaynesian proselytizing falls on deaf ears.

I’ll leave Andrew to confirm, deny or remain silent on your and my interpretation of his philosophy.

]]>Andrew’s been pretty clear on why he doesn’t like probabilities for models — it’s because they have a sensitive dependence on details of the prior that have negligible effect on inferences *within* the model. Also because the Occam factor argument that physicists tend to use to argue in favour of that dependence place a value of simplicity which is inappropriate in the social science context of the models he works with.

Regarding predicate calculus and quantifiers and such, all I can say is that I’ve never felt the lack.

]]>“we may conclude that there is no analog, or at least complete analog, in the algebra of systems [of propositions], to contradiction in the algebra of propositions”

In his notation there is no A or ~A for systems of propositions. The lazy answer to this is ‘we only need propositional logic’ but that program seems to me have failed in all foundational mathematical and philosophical projects.

]]>If you have some doubts about whether there should be a trend or a change point or some such thing that eliminates exchangeability, then there’s nothing keeping you from going back and putting those in the model when they seem to be warranted after seeing the data. We’ve been over this before about how any given Bayesian analysis is always some kind of truncated asymptotic expansion of the full set of models you’d be willing to entertain, and when that expansion becomes singular you’re free to go back and add in the “missing” components.

]]>Quickly – the point is probability is an additive measure of uncertainty. This makes sense for obsevrables. Not so much for unobservables for which our uncertainty is often non-additive.This comes up all the time in practice when dealing with non-identifiable models.

Why did you say to use the ‘obvious’ equivalence class above? To restore additivity it seems to me.

Why doesn’t Andrew like th idea of the probability of a model, preferring predictive checks. I’d argue an at least implicit recognition of non-additive uncertainty.

]]>“Once more, if you start from an exchangeability model, no observation whatsoever can get you out.”

This is simply not true. Suppose on friday I evacuate a tube and use it to drop balls with several timers in the absence of air resistance. I decide my apparatus is well behaved and all the observations are exchangeable.

On monday I come back and drop the balls in free air… there is absolutely nothing that prevents me from using this information to change the model I use for the errors in the monday dataset. The Bayesian way is to condition on a set of knowledge, and the knowledge I have about the friday dataset is different from the monday data set and I can immediately apply it to my model. This holds for any and all future experiments that I do as well.

Exchangeability is a property of the knowledge set used for assigning the probability expressions.

]]>If you follow Jaynes formally, this means that you define model and some alternative in the framework of a supermodel, and goodness-of-fit testing shifts probabilities around between submodels of the supermodel, which in itself you can’t test. However, it may well be that Jaynes also made remarks to the effect that you can test the supermodel without setting up a Bayesian super-supermodel, which then means that you go out of the Bayesian framework. I have read the Probability Theory book but I can’t claim to be the very best expert on Jaynes and can’t claim for sure that he consistently sticks to the Bayesian setup overall.

What I mean is still that epistemic probabilities do not model the data generating process in itself, only a person’s knowledge about it, and this holds for “sampling models” as well, and Jaynes is quite clear about this in his book.

Daniel: You know the wrong frequentists. Frequentists do model checking. Empirically I’m fairly sure that they do more of it than the Bayesians (Andrew may have some good influence there).

If you get the same people you are referring to to do a Bayesian analysis instead, they will make the same hash of it.

“Suppose a Bayesian is told that a computer is outputting a sequence of random numbers from some univariate distribution with a mean and standard deviation… Are you saying this Bayesian isn’t logically warranted in collecting a large set of data, and then define say a Gaussian Mixture Model (GMM) with a large number of parameters, and then fit those parameters in such a way that they specify p(Data | Parameters) = GMM(Data | Parameters) because this GMM doesn’t represent a state of knowledge it represents a physical frequency enforced by computer program?”

I don’t know whether this was for me, but I’m not the one who is overly prescriptive here. Surely the Bayesian can do that but still it’s either fully aleatory or fully epistemic, and I’d like the Bayesian to tell me what it is.

]]>ojm, if I have multiple mutually incompatible models that can be equally consistent with my observables, I can’t see how rejecting “P xor not-P” helps me; the issues seem orthogonal to me. What connection do you see?

I don’t recall Cox investigating vectors of propositions *per se*. I do recall him working on a logic of questions which did involve systems of collections of propositions; IIRC the aim was to provide a quantitative relationship between the informativeness of answers to various question in the same was that Cox’s theorem gets at a quantitative relationship between the plausibilities of uncertain propositions. There’s a guy named Kevin Knuth who has picked up and continued that program.

I’ll have a look at Dubois’s stuff. I started the Colyvan paper and it’s making me grit my teeth — both Cox and Jaynes were very clear from the outset that they did not purport to quantify all uncertainty (like the definitional uncertainty Colyvan is going on about) but only uncertainty about propositions for which “P xor not-P” makes sense. But I’ll persevere… (A dude named Alain Drory pulled a similar trick about Jaynes’s analysis of Bertrand’s problem: he set up a straw man, misattributed it to Jaynes, and then knocked it down by arguing for the position Jaynes actually held.)

]]>Once more, if you start from an exchangeability model, no observation whatsoever can get you out. It mean that you don’t only treat the situation as symmetric before observations, it means that you commit yourself to treating it symmetrically forever regardless of what the observations are.

I don’t think that such knowledge ever exists. I rather think that the exchangeability model is an idealised *model* of the knowledge ignoring some of its complicating subtleties, as is the frequentist model an idealised model of the data generating process that commits us to ignore some subtleties for the sake of simplicity.

]]>Suppose a Bayesian is told that a computer is outputting a sequence of random numbers from some univariate distribution with a mean and standard deviation… Are you saying this Bayesian isn’t logically warranted in collecting a large set of data, and then define say a Gaussian Mixture Model (GMM) with a large number of parameters, and then fit those parameters in such a way that they specify p(Data | Parameters) = GMM(Data | Parameters) because this GMM doesn’t represent a state of knowledge it represents a physical frequency enforced by computer program?

That seems extremely odd to me.

What my complaint is is that the Frequentist is willing to go the other route, they simply before seeing any data specify that the GMM is actually definitely a one-component mixture with unknown mean and standard deviation and then proceed to “guarantee” that their interval estimates of the mean and standard deviation are guaranteed to be right 95% of the time that they collect data even though they haven’t established in any way that this initial assumption of gaussianity is relevant (and the same thing happens with lognormals, gammas, beta, whatever, they choose a particular family based on a hunch and maybe a failure to reject using a test and then predict frequencies from it!)

What’s more is that they tend to apply this in settings like 30 samples from human randomized controlled drug trials etc where serious deviations from whatever your assumed shape is are essentially guaranteed at least in the even medium-short run (like if you do this in 5 or 10 different problems one of them will have a serious violation that you can’t detect because you only have 18 data points)

]]>Is Probability the Only Coherent Approach to Uncertainty?

]]>I have no problem with dealing with stable frequencies, I just want to concentrate the inevitable Bayesian uncertainty around a particular shape of the distribution before doing the frequency calculations.

]]>The Bayesian can say “the frequency of outcomes is f(Data | Params)” and these are exchangeable among the observed data and all the future data of interest to me…. But this is a model assumption that the Bayesian doesn’t HAVE to make. The Bayesian can simply say “my probability over what would occur is p(Data | Params) for the particular set of data run in this particular subset of my current experiment” and the assumption means nothing about symmetry of the physics, it means only symmetry of the knowledge.

On the other hand, I see no way around assuming symmetry of the physics in Frequentist logic.

]]>As I mentioned, Jaynes discusses “goodness of fit” tests. If this is not testing a model against data, what do you mean precisely?

Gelman and Shalizi in their “Philosophy and the practice of Bayesian statistics” list Jaynes as one of the writers who “emphasized the value of model checking and frequency evaluation as guidelines for Bayesian inference” and later single out Jaynes in particular: “A more direct influence on our thinking about these matters is the work of Jaynes (2003), who illustrated how we may learn the most when we find that our model does not fit the data – that is, when it is falsified – because then we have found a problem with our model’s assumptions.”

Are they talking about testing a model against data?

]]>But that’s the thing: Jaynes models the state of knowledge of his robot, not the data generating process. Which means that the full model (I’m not talking about shifting probabilities within the full model around between submodels) cannot be tested against data, because it is simply not about where the data come from. Daniel, who surely has a good grasp of Jaynes-type Bayesianism, explains the same thing above: “If you collect an infinite amount of data on the Bayesian model, you can not falsify the model with observed frequencies, because the model doesn’t specify the frequencies it specifies the knowledge you had at the beginning about unobserved things.”

Also I think that you’d need to decide: Are your probabilities epistemic or aleatory? I don’t think it works to have an epistemic prior and an aleatory sampling model, because none of the approaches to the foundations of probability licenses you, in this case, to use both of these kinds of probabilities in the same calculus.

]]>Why isn’t it desirable to assume P or not P in general? Well, identifiability for one: multiple mutually incompatible models can be equally consistent with your observables. There is a uniqueness issue. I saw a generalisation of Cox’s approach to constructive logic in which case P or not P also drops out. Note also in Cox’s book that when he extends to vectors of propositions he also obtains a weaker logic than Boolean logic. I feel like Jaynes didn’t read that far or something.

]]>“It is a foregone conclusion before you collect any data that your frequency model is false.”

It’s a model. If you take it too literally, it’s false, yes. Same with Bayesian models of knowledge.

“Suppose for example that you are modeling stock market returns, and we are all actually living in a computer simulation. In the computer simulation daily returns are *actually* normal(0,1) * 0.9997 + cauchy(0,1) *0.9993 then there IS NO average and yet you find this out only after something like 10000 days and bankrupting an entire industry.”

…which has nothing to do with whether your model is Bayesian or frequentist; using a plain normal model in a setup that is prone to outliers will get the Bayesian as easily into trouble as the frequentist.

“Most of the problems with (1) come from bolting on things that fail to follow logic (such as p < 0.05 means round-off to 0.0) or failing to have a way to make assumptions other than “irreducible uncertainty with stable patterns”"

Nothing in the frequentist interpretation of probability enforces treating p<0.05 as 0.0, and frequentists can make all kinds of assumptions that the Bayesian can make, because every Bayesian model can be given a frequentist interpretation.

A situation can easily arise where you do care about predictions — your action space and loss function could be predictive after all — and then you’d best start caring about predictive calibration… There’s an example in BDA in which a log-normal model works very well for one estimand and disastrously for another due to failure to get the tail correct.

]]>Elsewhere I’ve made the point that among all of the isomorphic plausibility systems allowed by Cox’s theorem, only one — probability — appears in the Law of Large Numbers for exchangeable random variables, and it’s this strong connection between probability and expected frequency that helps resolve the underdetermination of the Cox theorem result. I can’t have my cake and eat it too — and in any event, I’ve always been more concerned than you and Joseph about the lack of predictive calibration inherent in the truth-in-the-high-density-region-is-good-enough stance.

]]>I think the likelihood represents a model of the world in the Bayesian setting as it does in the frequentist setting. I still don’t get your point. What kind of evidence regarding the model do alternative inference frameworks provide which is missing in the Bayesian framework?

]]>p(X | K) = product(p(x[i] | K), i = 1..N)

this holds because K is a constant, rather than something that changes for each data point. As soon as you have a time-series structure, or a potential change-point model or whatever, then this symmetry property doesn’t hold and you don’t have exchangeability. For example you might do a gaussian process for a signal through time, where every data point is really just part of ONE function whose values you observe with error at various times or places. The fact is, you’re putting probability on *one* object, a vector of observations.

Though in actual fact, most biochemistry examples I’ve done involve “these are all from one batch, and then this bunch are another batch, and Joe did some extra ones… and then we repeated our first batch with the reagents we re-ordered from a different supplier…”

So that in the end, exchangeability is only within small groups of data points.

Taken as an expression of symmetry of knowledge, the Bayesian IID exchangeability is un-complicated. Taken as an expression of an assertion about the world that *frequencies of outcomes* will be constant, it’s a highly objectionable and essentially always wrong on the face of it assumption.

]]>The exchangeability is in the applicability of a single simplified state of knowledge to the assignment of the probabilities.

]]>in case it was me, F ~ normal(m,s) with 3000 observations is intended to inform me about m,s which are uncertain quantities that have priors. I never make any predictions about the F vector, I only plug in the F values I collect and make predictions about the m,s

]]>To me the larger point is this:

1) Frequentist logic is 2 valued logic with irreducibly uncertain outcomes that follow stable patterns through time.

2) Bayesian logic is real-valued logic with reducibly uncertain outcomes that follow rules given by physics and chemistry and biology to within a range of prediction given by a real-valued weight function over conceivable outcomes.

3) (2) completely contains all the logical parts of (1) as a special case

4) Most of the problems with (1) come from bolting on things that fail to follow logic (such as p < 0.05 means round-off to 0.0) or failing to have a way to make assumptions other than “irreducible uncertainty with stable patterns”

]]>One can see this as a feature rather than a bug, see Andrew’s and my paper on “Beyond objective and subjective” as linked by Keith further down. ]]>

Like with Fisher, one can probably occasionally get apparently inconsistent messages from Jaynes. Mo doubt somebody can explain why they are not really inconsistent. My understanding is that Jaynes embeds the statements cited by you into a general epistemic Bayesian logic, i.e., one better sets up a model that includes the possibility that ball drawing doesn’t work in the way elementary sampling theory would imply and assign probabilities to that, too. Then obviously data can shift probability from one submodel away to another. Still then data cannot give evidence against the bigger model that was used to allow testing the more restrictive one within it.

Ah! You say it yourself!

Within a supermodel, all this inference can be done for submodels. But what is ultimately modelled is still to what extent one should believe (or not) – even within the submodels – rather than the data generating mechanism itself, which was my original point.

]]>Suppose with Frequentist hat on you say that F ~ normal(m,s) you think that your model has 2 unknown parameters. But in fact your model has 62 parameters or so and you are specifying 60 of them precisely (they describe the shape of the normal distribution). If I collect say 200 data points I can falsify your model because your model will predict that F should fall between say 5 and 10 more often than it really does…

If you collect an infinite amount of data on the Bayesian model, you can not falsify the model with observed frequencies, because the model doesn’t specify the frequencies it specifies the knowledge you had at the beginning about unobserved things. The Bayesian model F ~ normal(m,s) says “if you tell me m and s I estimate that each individual F will be in the high probability region of normal(m,s), if you observe F 3000 times, it gives you a joint probability over the 3000 dimensional vector. You will only ever get ONE observation in a Bayesian model. All your data is ONE vector. There is no frequency.

This I think is the essential difference. A Frequentist analysis is falsifiable by observation that the actual data deviates from the assumed sampling distribution. A Bayesian model is only “falsifiable” by Bayesian comparison with an alternative explanation.

Of course, you can use p values for model checking in Bayesian stats, but it’s only relevant when your model is really a Bayesian model over frequency distributions. Then you can find out that indeed you don’t do a good job of matching the real-world frequencies.

]]>Jaynes spends many pages in his Probability Theory book talking about extracting balls from urns (you can’t get more world-located than that), and the last section in that chapter includes the following remarks (slightly edited):

“Sampling distributions make predictions about potential observations. If the correct hypothesis is indeed known, then we expect the predictions to agree closely with the observations. If our hypothesis is not correct, they may be very different; then the nature of the discrepancy gives us a clue toward finding a better hypothesis. This is, very broadly stated, the basis for scientific inference.”

“In virtually all real problems of scientific inference we are just in the opposite situation; the data D are known but the correct hypothesis H is not. The the problem facing the scientist is of the inverse type: Given the data D, what is the probability that some specified hypothesis H is true? […] In the present work our attention will be directed almost exclusively to the methods for solving the inverse problem. This does not mean that we do not calculate sampling distributions; we need to do this constantly and it may be a major part of our computational job.”

The likelihood is obviously related to the sampling distribution (although it’s not exactly the same thing, the likelihood is not a probability distribution because we switch the roles of the variables). It looks as a model of the world to me, at least as much as when the same sampling distribution appears in frequentist methods. Of course probability can mean other things beyond frequencies, but I think sampling distributions are one of the places where the connection appears automatically.

Regarding the evidence for models being “wrong”, you can look at different models (for example a simple model embedded in a more complex model). And there are also ways to do “goodness of fit” tests (I’m not sure if that’s the frequentist alternative that you’re thinking of). Jaynes discuss the issue extensively, with the following conclusion:

“Our discussion of significance tests is a good example of what, we suggest, is the general situation: if an orthodox method is usable in some problem, then the Bayesian approach to inference supplies the missing theoretical basis for it, and usually improvements on it.”

]]>But this is how modelling *in general* works. With Bayesian modelling (of knowledge, rather than of data generating processes) it’s just the same. With exactly the same right I could, if you give me a Bayesian model for anything, say, how can you be sure that this is an *exact* representation of our knowledge. For all Bayesian models I have seen involving exchangeability (which is pretty much all of them) I *know* that they don’t completely capture my knowledge (or lack of) about the world, because having exchangeability is extremely restrictive and I have never seen any positive proof for it in any situation – so the knowledge model should put some probability elsewhere, shouldn’t it? But that would often be spectacularly complex and even if it wouldn’t, one can find other annoying knowledge (or lack of) about reality that could shred your knowledge model to tears if you took it too seriously.

But still these can be models that are fine for the purpose and very useful.

(Note by the way that I believe that all useful probability modelling needs to involve an element of repetition modelling, be it frequentist “i.i.d.”, Bayesian exchangeability, or whatever, because we need to make sure that with help of the model we can use the past to make statements about the future. This is a requirement of our reasoning; this is *not* because the world really is like that; we need to do some idealisation violence to the world in order to get this.)

Also you need to understand that the point I’m making is *not* about “believe” that the world really is like that. We use models, frequentist and Bayesian, to achieve something; they are modes of thinking and communication. As such they can work (but of course won’t always) without the world being “really” like that.

It still looks as if you demand some kind of perfection of frequentist modelling that Bayesian modelling can’t give you either, and actually no modelling can (because it runs counter to the nature of modelling).

]]>I’m fine with doing frequency calculations, it’s just that you need to tell me why you believe f(R) is a property in the world and under what special conditions it is stable.

Typically you might start with “biochemistry tells me that all these particular situations I’m studying in the lab are biochemically similar so a stable frequency should be observed, let f(R) be a member of an extremely flexible family of distributions, such as a gaussian mixture model with 20 components… here are the epistemic priors I have over the 60 quantities that completely describe the 20 component GMM, here is a bunch of data… in the end the epsitemic distribution over the 60 vector is narrowly concentrated around some particular value Q in the 60 dimensional vector space, it’s narrow enough that I can choose Q as a sufficient approximation and call fQ(R) “the frequency distribution”

As soon as you do that and then start doing frequency calculations, you and I are on board the same spaceship.

]]>“One interpretation of your comment 1 would be that a Bayesian analysis should never be done as it combines the prior and likelihood e.g. log(posterior) ~ log(prior) + log(likelihood).”

I’d agree but only *if* the prior is interpreted as epistemic and the likelihood is interpreted as aleatory (frequentist); the posterior is then a muddle. Many people seem to have such an interpretation in mind but one can do it in a fully frequentist manner (see above) or in a fully epistemic manner (de Finetti, Jaynes,…).

And this has good reasons because probability calculus can be justified from epistemic axioms, it can be justified (or at least motivated) from properties of relative frequencies, but I’m not aware of any setup of probability calculus that mixes them. If the prior probabilities of parameters are epistemic, and those given the sampling/data model are aleatory, it is not clear what kind of animal the posterior probabilities (derived from both of them) are. ]]>

1) With a Bayesian analysis it is sufficient to say “for all I know R is somewhere in the high probability region of p(R)” and it’s justified by the fact that it is explicitly a statement about *for all I know*, that is a state of knowledge.

2) With a Frequentist analysis we are explicitly making the assumption “after many observations the frequency with which R is between r and r+dr is f(r) * dr” or at least f(R) is a good enough smooth approximation. This is a statement about the future evolution of the physics of the world. What justifies it? You need something fairly strong! You need hundreds or thousands of data points.

When you collect 6 months of satellite fly-by images of a forest and you select randomly 400 from each fly-by and show me that the frequency of finding a certain quantity of IR emission caused by evaporative transpiration is such and such… and it is stable in each fly-by. I will believe you. Then we can do frequency calculations together.

]]>When you say “the frequency of rate R is f(R)” you are sweeping this longer and more detailed justification under the rug, but it must be justified, and it is justified by … your state of knowledge.

So even f(R) is a state of knowledge, it’s just a state of knowledge about the existence of a function f which under these conditions Y would model how often growth rate R will be seen, and also knowledge about what region of the function space f is in (what it’s approximate shape is). In fact, when you do a Bayesian analysis to fit f(R) you wind up with a posterior distribution over the shape parameters for f which more explicitly models your state of knowledge about f.

In answer to Carlos’ question, this is how you model the unstable atom, you observe many decays and see a regularity that is independent of many many things, and so you decide to assign a frequency distribution with your knowledge that regardless of what conditions you look at it, the duration of time to decay will always be one of the numbers in the high probability region. It’s justified by acquiring a huge number of observations into your knowledge set so that you can produce a sequence of histograms each of which is very similar independent of your manipulations within some regime.

The biggest issue I have in Frequentist statistics is that it is applied automatically *as if the existence of a frequency distribution were guaranteed*. Anyone who wants to do an explicit argument about why the frequencies of a particular scientific process are some function f and justify it with a wide variety of data gets a pass to do Frequentist calculations from me. But some doctors running an RCT on acupuncture with 35 patients or whatever… not so much.

]]>Before I fully respond could you put your comments in the context this excerpt from you recent paper

“A key issue regarding transparency of falsificationist Bayes is how to interpret the parameter prior, which does not usually (if occasionally) refer to a real mechanism that produces frequencies. Major options are firstly to interpret the parameter prior in a frequentist way, as formalizing a more or less idealized data generating process generating parameter values.” http://www.stat.columbia.edu/~gelman/research/published/objectivityr5.pdf

One interpretation of your comment 1 would be that a Bayesian analysis should never be done as it combines the prior and likelihood e.g. log(posterior) ~ log(prior) + log(likelihood). I don’t think you meant that but a recent reviewer made exactly that argument.

Comments 2 and 3 I largely agree with and for 3 I would put this as a case where a literal interpretation of the posterior would be OK.

]]>Why not? Unless I misunderstand what you say, I think Bayesian analysis does precisely work by combining the probabilities of the data generating process (i.e. the probability of data conditional on the parameters) with the probabilities representing the state of knowledge (i.e. the prior distribution for the parameters) to update the state of knowledge (ie. the posterior distribution for the parameters conditional on the data).

]]>How do you use science to predict when is an unstable atom going to decay?

]]>Frequentist probabilities are a model for data generating processes in reality, as Bayesian probabilities (if interpreted in this way) are a model for knowledge in the face of uncertainty.

I think that a major problem in discussions on the foundations of statistics is that people have a too naive idea about models and how they could be “true” or not in reality.

You are apparently opposed to using the frequentist interpretation of probability for modelling anything that doesn’t work very much like a random number generator. I think you have a too narrow idea of how models are used and can be used (actually, reading this again before sending, no, I don’t think that you generally have a too narrow idea, I rather think that you apply a narrower idea to frequentist models than you’d be happy to apply to your own favourite interpretation).

Obviously, applying frequentist models to something other than random number generators requires more idealisation, but that’s just how it is with models. Such idealisations need to be critically discussed, fair enough, and at times one may be convinced that in this-or-that situation it’s really not a good idea, but ultimately if we use mathematical modelling in any way, we won’t get around such issues. It’s just the same with Bayesian epistemic probability modelling; almost all Bayesian models I have seen involve exchangeability assumptions (at least on some level), and I have no reason to believe that anything in reality is exchangeable. Tough luck! This doesn’t make me an anti-Bayesian but it makes me very wary of dogmatic statements of the kind “only such-and-such way of modelling makes sense”.

Reference:

C. Hennig: Mathematical Models and Reality – a Constructivist Perspective. Foundations of Science 15, 29-49 (2010).