Generally, though, I suspect you’re right about use case. I am in the unfortunate business of trying to use statistics to actually say *something* about one-off events. I have the further problem of trying to say something about them while maintaining some integrity. With sufficient modesty in the claim, I can do both, but I’m inherently limited by noise.

]]>Suppose I’ve measured the residual returns on a bunch of stocks on the days of their stock-split announcements. All of these fail to reject 0 for some n-sigma test, either individually or averaged over stocks, or both.

But, I then regress these statistically insignificant residuals on market cap. I find a strongly significant relationship between the stock-split residuals and market cap. Well, that’s weird: didn’t I just decide these announcements were all just draws from random noise?

]]>Call p(theta) a parameter model instead of a prior, cool.

Just don’t call p(y|theta) a likelihood and confuse everyone.

]]>We agree low power is a problem if your goal is to accept or reject “no effect.” I think my only point here is, especially with these one-off events, that you’re throwing away a lot of useful information. Indeed the measured residual *may* be all noise and no effect, but I would still like to use my model(s) to learn about more-and-less plausible values of the effect.

The fact that they’re unique (as you say, and I meant to imply, these events are never really identical), means that you need to thoughtfully combine evidence from “similar” events to say answer questions. E.g.: Are stock-split announcements more meaningful for some companies than others? Do bad debate performances matter more for front-runners? Just making some binary decision and moving on doesn’t help you accumulate any knowledge. The situation is different when you do have repeatable experiments, because you have some hope of a single, well-defined signal coming through.

My problem with calling it a causal inference test is that it’s not. All stock announcements and all debates have some causal effect on the mechanisms that drive stock prices and approval rates: they always change somebody’s mind. So there is for sure a non-zero causal effect of these things. You can have a test for “small enough to matter,” I don’t think n-sigma is a good way to do that, and I think you’re really forced to talk about plausible effect sizes. Your burden-of-proof formulation isn’t wrong per se, I just think it’s very limiting.

]]>(a) Smaller effects might well be interesting, but that’s just the point Andrew has made many times: against a noisy background, there’s no chance to pull out a small signal. The signal might be interesting if you could prove it existed, but you can’t.

(b) Model uncertainty is less of a problem for stock market models because we have strong reasons to believe that something like the standard market model works. That said, there can be issues with that model, but all that will do is raise the boundary on big movements.

(c) I’m not even sure what it would mean to call a debate or corporate news event “identical.” No such beasts.

All that said, I still don’t see what’s wrong with phrasing this as a causal/counterfactual problem. “This is the sort of move that occurs 1 day in 300 in the market after adjusting for the sort of news that affects stocks generally. I claim that this day was unusual because the following thing was reported widely. Without arguing I’ve proved causality, I claim to have least shifted the burden to you to show some other reason the price change was this big.” ]]>

You can update your state knowledge using the data/likelihood to get the posterior distributions for the parameters. You don’t update your distributions for the data but now you have an updated model so you can produce a new predictive distribution for the future data. It’s not clear what does it mean to have a joint generative model of all variables. Let’s say I’m doing a regression of height on age, do I have to be able to generate pairs (age, height)? To analyse the data I only need to model the height as a function of age, because everything is conditional on the age values (which are a given).

Of course I can change the model to introduce latent variables (i.e. new parameters) where I had data before and say that now the data are the observed values which are noisy, censored, etc. And in that case you need to model the presence of the cat explicitly. But I would say that in the new model there are still parameters and data clearly identified as such. Maybe the distinction is not “deep”, though. I will be interested in seeing how you present this issues in the new edition of your book.

]]>It avoids a lot of complications (and is one member of equivalence class of functions of a parameter) so you just lose the complicated math. No one is likely to think the ratios of probabilities should integrate to one?

Maybe just the joint model before and after the data with the ratio of (marginal over possible data) joint model before divided by the (conditional) joint model after the data – to display the effect of observing that data in that joint model.

]]>See also my comment just below on y free vs y0 fixed etc. I don’t object to eg calling p(y|theta) a data model and p(theta) a parameter model, but I do object to the confusing misuse of the term ‘likelihood’.

]]>But there are a mass of distributions which satisfy this definition for “prior” which have very different properties from what people intuitively expect. For example, it’s possible for a “prior” according to this definition to depend on the number of data points.

None of this would be an issue if statisticians could think their way out of a paper bag. It would be sufficient to just say “prior” makes sense in some instances, but in most cases you need to examine the equations to see what’s what and think things through each time. But they obviously can’t think their way out of a paper bag, so controversies that what be trivially solved in other communities become century long debates that show no sign of ever being resolved.

A good first step in cleaning house in statistics though would be to get rid of anything that can’t be given a precise mathematical definition. What ever monumental problems p-values have they can at least be given a precise definition. “Prior” distributions can’t. So get rid of the term.

]]>Likelihoods and priors are different. But they are also alike. Since the same probability assignment can serve as both “prior” and “likelihood” in the same model, depending upon whether a case is observed or not, clearly they have a lot more in common than the usual teaching approach suggests. Presenting them as different animals blocks solutions, I argue.

For applied mathematicians, maybe these issue of terms aren’t important. But students from the sciences, in my experience, aren’t served when we use frequentist terms to describe Bayesian concepts. It leads to common conceptual errors.

The same holds in reverse, I think: describing frequentist inference with Bayesian concepts distorts learning.

Ultimately, I think the words we use to describe the math are always going to have flaws. If the words were sufficient, we wouldn’t need the math. So multiple, shifting frames are often needed to make sense of complex topics.

]]>Also nice to see the issue of logical omniscience and bounded rationality discussed there.

]]>“Variability” refers to natural variation in some quantity. It’s also called aleatory (from Lat. aleator, gambler)

uncertainty in some fields

“Uncertainty” (in this usage) refers to the degree of precision with which a quantity is measured. It may be called epistemic uncertainty or fuzziness in some fields

Examples:

• The amount of a certain pollutant in the air is variable,

since it varies by location and by time

• The amount of a certain pollutant in the air is often uncertain — we usually can’t measure it accurately, and often we can’t measure it at all.

We can model both “variability” and “uncertainty” by random variables.

]]>If you just said – I don’t think Aitkin’s approach is practical/relevant to my work – in your review, fine. It would also be a much shorter review.

The issue is all the other stuff about how it’s not Bayesian, we’re Bayesian etc. And the original bolded quote that I called (and still think is) rather hypocritical.

]]>Ahhh, but I *don’t* think the approach in that book is relevant for applied work! I do think that there are lots of non-Bayesian statistical ideas that are relevant for applied work. But I don’t think the ideas in that book fall into that category.

> We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective but nevertheless is relevant for applied work.”

]]>The full sentence: “We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective and why we do not find it relevant for applied work.”

]]>> We do not claim here that Aitkin’s approach is wrong per se, merely that it does not fit within our inferential methodology, namely Bayesian statistics, despite using Bayesian tools.

Your inferential methodology is Bayesian but his is not? Who decides?

]]>> We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective

]]>I guess we didn’t write that review article clearly, so let me explain right now: Our point was not “he’s not doing real Bayes.” One clue to this is that we never said such a thing! Our point in the quoted passage was that the work in question represented “solutions to problems that seem to us to be artificial or conventional tasks with no clear analogy to applied work.” The book could’ve been 100% Bayesian and we’d still have a problem if there were no clear analogy to applied work. Conversely, the book could’ve been 0% Bayesian and we’d have no problem if the models made sense.

Also, we never said anything like, “we are the true priests of Bayes who decide what is and isn’t Bayesian.” Again, a useful clue here is that there was no such quote in our article; you had to make it up! I’m open to all sorts of philosophies of statistics and I think it’s best to evaluate them on how they work in real examples.

As to why we wrote the article in the first place: I can’t remember. I think Christian received the book in the mail and then we discussed it with Judith, and we wrote down our thoughts.

]]>Again the religious analogy seems ironic since it comes across a little ‘we are the true priests of Bayes who decide what is and isn’t Bayesian’ despite not having a fully-formed/consistent definition of what is and isn’t Bayesian.

If you can be eclectic when it comes to foundations why can’t others? Which is to say, I’m confused about why you wrote such a long, dismissive review in the first place?

Disclaimer: I met Murray Aitkin once recently and, while I don’t totally buy his position (not that I buy many if any), this was (it seems to me) basically his complaint about your review and I had to agree.

]]>I agree, that like an event study, the question here seems to be not a literal change/no-change one, but a causal/counterfactual one. But I think it’s still even an ill-posed question in that form. Stock event studies and are a good analog, since they’re both basically the aggregation of a bunch of underlying dynamic opinions. If some news about a company causes one investor to sell, then there’s clearly some causal effect of the news, just as a debate might cause one person to disapprove of a candidate.

But a 2-sigma test has no power to detect that, and really isn’t even looking. It can only answer something very narrow: after controlling for all things I think might predict my series, is there a >= 2-sigma residual that I’m confident could be from no other cause but the event (debate, SEC filing, etc)? That then disqualifies from my study and < 2-sigma sized effects, which might be important and interesting! And it also leans a lot of causal inference on a given forecast model — once you factor in model uncertainty your power for identifying interesting effects is pretty much nil. So you not only don't get to phrase your question as an NHST problem, you don't even get to phrase it as a causal inference or counterfactual problem. Then what are you doing?

Without recourse to the CLT by looking at lots of identical debates or corporate news events, I don't think even the barest dressings of reject/accept logic are very useful for these situations.

]]>Setting aside the appropriateness of the Unitarian analogy, I disagree with your comment regarding Bayesian foundations. I take Bayesian foundations very seriously and have written a lot on the topic!

]]>But seriously, I think I have a situation where I can call something, let’s label it A as a formal system with some very simple requirements… and then probability theory becomes a model of (a subset of) that formal system. It’s now safe to say that “probability is a kind of A”….

I’m not giving away all of it yet, because I’m writing a paper basically as we speak, but the upshot is “A” is something that looks a lot more like “testing a model’s explicit assumptions” than “inferring a probability that statement X is true”

I think at this point, it’s fair to say “I don’t want to do A” and then, you could go off and do something else, but A is a pretty simple thing in my conception, and it’s a fairly obvious thing for a scientist to want to do. The only thing I don’t have is a uniqueness. I’m not sure probability theory is isomorphic to A, only isomorphic to a subset of A.

]]>> Once you have abandoned literal belief in the Bible, the question soon arises: why follow it at all?

seems pretty hypocritical given your attitude to Bayesian foundations.

]]>2. I would bet that in 20 years papers with p-values and ideas of null will be viewed as quaint. I hope statistics is young enough to reconstitute itself. I would expect it will become much more visual as the application to geometry develops so you can see the potential and probable spaces. (I almost said the application of choice to geometry because all decisions within any model space are choices and thus ….) I wonder how long it will take people to grasp that 2D graphics aren’t good enough given the 3D rotational potential in any virtual space like a freaking web browser. By good enough, I mean, for example, that rotation reveals the distortions imposed by compressing complexity to a plane, same as a 3D scan of the liver versus a compressed planar view, and yet you have to spend so much time talking about bleep flat graphs and even charts that don’t show any recognition they’re something that maps to a probability space. I find, for example, a lot of the analysis through logarithms becomes more interesting when you see the pathway made by the application.

3. As fun, I think the conception of Unitarian as mentioned misses a point and evidences an implicit choice function – as any implication/definition/implementation must. That is, Unitarianism was not a single but many choice interpretations of an inherently unstable synthesis. It may help to think of that as a field, the field being the relationship between at least Jesus, the Holy Spirit and whatever one labels ‘God’. This synthesis to me mirrors the Biblical conception of who begat whom, which Judaism picks up in Abraham, Isaac and Jacob and which generalizes to past, present, future and of course strings of choice (but that is harder to explain) or what I like to point out at Seders that we all eat the afikommen because we are all the middle matza, because we live and carry with us the past and potential for our own and others futures. The issue of stability is pretty basic: you can formulate a line of God-Jesus-Holy Spirit and even reverse that but then you are arbitrarily fixing the chain, much like when you impose a restriction or type or even an artificial ‘universal set’. One might hope ‘we all agree on this chain’ but understanding changes with context so as much as ‘we’ would try to hold on to the G-J-HS string questions would arise about G because G affects your understanding of J and you have to ask how much of HS is in this world now versus the G before the J and thus what is J in this world, etc. The straw man idea that one could have literal interpretation proves the impossibility of literal interpretation when one breaks the ideas of literal interpretation – of a specific Trinity, of a specific text, etc. – down to the choices implicit in the statements of these ideas. Or bluntly, you got to start counting and there’s always basic questions: where do you start from and what do you call start and where do you count to and how do you know you’re there? You could find in that statement the Ancient Greek paradoxes of motion, including what became infinitesimals into limits into transfinite numbers and Tom Stoppard’s rendition in Jumpers of the old joke that since the arrow first crosses half the distance and then half the distance that St Stephen died of fright. I would also quote the old camp song: we’re here because we’re here because we’re here because we’re here

– which I find is one of the best statements of the halting problem. So you pick a ‘stable’ definition for the G-J-HR field and hope it lasts like a proton! It can’t because it ain’t a proton; it’s an unstable ‘molecule’ ready and willing to bond with any idea that comes along, from messianic claims to utter denials. I’m looking at a 10 week old puppy so I’m stopping now.