It’s not so hard to move away from hypothesis testing and toward a Bayesian approach of “embracing variation and accepting uncertainty.”

There’s been a lot of discussion, here and elsewhere, of the problems with null hypothesis significance testing, p-values, deterministic decisions, type 1 error rates, and all the rest. And I’ve recommended that people switch to a Bayesian approach, “embracing variation and accepting uncertainty,” as demonstrated (I hope) in my published applied work.

But we recently had a blog discussion that made me realize there was some confusion on this point.

Emmanuel Charpentier wrote:

It seems that, if we want, following the conclusion of Andrew’s paper, to abandon binary conclusions, we are bound to give :

* a discussion of possible models of the data at hand (including prior probabilities and priors for their parameters),

* a posterior distribution of parameters of the relevant model(s), and

* a discussion of the posterior probabilities of these models

as the sole logically defensible result of a statistical analysis.

It seems also that there is no way to take a decision (pursue or not a given line of research, embark or not in a given planned action, etc…) short of a real decision analysis.

We have hard times before us selling *that* to our “clients” : after > 60 years hard-selling them the NHST theory, we have to tell them that this particular theory was (more or less) snake-oil aimed at *avoiding* decision analysis…

We also have hard work to do in order to learn how to build the necessary discussions, that can hardly avoid involving specialists of the subject matter : I can easily imagine myself discussing a clinical subject ; possibly a biological one ; I won’t touch an economic or political problem with a ten-foot pole…

Wow—that sounds like a lot of work! It might seem that a Bayesian approach is fine in theory but is too impractical for real work.

But I don’t think so. Here’s my reply to Charpentier:

I think you’re making things sound a bit too hard: I’ve worked on dozens of problems in social science and public health, and the statistical analysis that I’ve done doesn’t look so different from classical analyses. The main difference is that I don’t set up the problem in terms of discrete “hypotheses”; instead, I just model things directly.

And Stephen Martin followed up with more thoughts.

In my experience, a Bayesian approach is typically less effort and easier to explain, compared to a classical approach which involves all these weird hypotheses.

It’s harder to do the wrong thing right than to do the right thing right.

78 thoughts on “It’s not so hard to move away from hypothesis testing and toward a Bayesian approach of “embracing variation and accepting uncertainty.”

  1. The key problem seems to be how to summarize the “results” of the analysis. Plotting posterior distributions is fine, but one has to draw out a conclusion from them, which brings us back to the problem of making binary decisions about the problem (effect present/absent).

    • Exactly, a posterior is great when interpreting the data at hand. And then I have to decide whether I will do a follow-up study, or abandon a line of research and focus on something else. This is a dichotomous choice, and as researchers, we are faced with this type of decision constantly. I’d love to follow up on a study with a 84% posterior probability, but unless I find a way to distribute my actions across parallel universes, I do a follow up study, or I don’t.

      • I think the biggest problem here is conflict of interest / principal-agent problem.

        The researcher would “Love to” spend money + have a career + feed their family + whatever. Although most researchers would like to get some useful result for society if they could, they recognize that they won’t always do so. The NHST paradigm gives an excuse to say “see I didn’t waste your money we rejected the null X times in this paper, so we found out X facts about the world”. Of course, this isn’t anything like true, but it does then help justify giving the next grant so the researchers can spend more money…

        People who have the self-reflection sufficient to recognize that this is at best an unsatisfactory career, and at worst outright fraud (Wansink etc) just get the heck out of Academia.

        Here’s what an academic decision analysis probably ought to look like:

        1) List out a plausible set of future studies that you are qualified to do
        2) Quantify the number of people in the population who are affected by the issue you are studying
        3) Quantify the marginal effect of a “successful” study (let’s say reducing the time to a cure for a disease by x years, or whatever… this is the real soul searcher where many academics will find that the marginal value of their research is exceedingly close to zero, but I think you could argue that just “knowing the truth about X” is worth a few dollars or pennies or something per person in the US provided X isn’t too trivial. Of course, the researcher gets the same value “money spent / career continued / children fed” regardless of study, so it *shouldn’t* enter into the calculation, but it also shows why “any grant will do” from the researcher’s perspective.
        4) Using prior studies perform a decision analysis across the choices of follow up studies and choose the one with the largest expected value.

        Now, someone will tell me it’s “too hard” to do a good job of this, but it’s not “too hard” to do a study on time domain integral transforms for the solution of space vehicle stabilization for landing on asteroids or comb through 100 gigabytes of public use microdata on the incidence of “enjoying cat pictures” pre and post the first release of the Netscape browser or whatever.

        I think the surrogate for this decision analysis is more or less:

        1) Think of all the grants you can write.
        2) Think of how much money you are likely to get if your grant is approved.
        3) Put probabilities on whether the grant will be approved, grants will be approved when lots of preliminary study data involving lots of NHST rejections have been already performed, “proving” that there is an effect to be studied.
        4) Write grants in order of largest expected dollar gain for the lab.

        • Also, I should say Andrew, that this paradigm shows why the grants for Stan are a HUGE win for society. The outcome is “tens of thousands of researchers have a tool to improve the quality of their science…” which means that Stan’s expected value to society is *tens of thousands of times* larger than a typical academic grant.

        • But unlikely to typical academic granting agencies…

          Let applicant’s peers decide who/what gets funded.

          Let the journals and university publicists justify the value of that funding for us.

          Don’t let anyone else be sufficiently enabled to do any alternative credible assessments of our funding.

        • Sometime I wonder if academia would benefit by moving to some kind of “adversarial model of publication”.

          So every result, ought to be critically questioned a la judicial courts by a panel of designated players in the academic publishing model.

          I mean even a PhD defense is a cakewalk by the standards of a courtroom argument.

        • This seems like a case of the grass always being greener on the other side.

          I’m all for finding ways to improve academic publishing models, but I don’t understand the repeated suggestions to use the adversarial court system as a positive model. The suggestion seems built on an idealized notion rather than how evidence and argument actually works in actual court proceedings.

          The players in the (US adversarial) court system are explicitly incentivized to make biased arguments in their own favor without regard to truth. The ideal is that the judge/jury somehow can see the best argument through the fuzz that both sides are intentionally creating. But in practice, that very very often does not happen. And for obvious reasons.

          There’s a lot of interesting legal scholarship about the ways in which the system fails to achieve its ideals, and the pros and cons of non-adversarial vs adversarial courts systems. I’d be very interested to someone who wants academic publishing to be like courts actually spin out the full vision of how it would work, complete with suggestions for avoiding the problems courts actually see in practice.

        • I actually think open recorded/tracked commentary on published work provides a lot of the benefits of “adversary”. In my proposal for a peer-to-peer publication system the existence of a UUID for all publications then makes it easy to query for articles/commentary referring to the given UUID, so a “comment section” comes “free” with the indexing

          http://models.street-artists.org/2017/02/23/public-distributed-cryptographic-scientific-publication/

        • I agree (to the extent I understand the proposal). That’s an “enough transparency so anyone can call BS and everyone can see it” model, which I completely agree with. I don’t see how importing ideas and standards from courtrooms adds anything useful.

          The principle that scientists should take adversarial positions with respect to all work (including their own) is great. In fact, we already have that principle in science. People are successfully pushing for changes by showing how our current systems fail to live up to that principle.

          But adversarial doesn’t necessarily imply “like the courts.” Thank goodness.

        • Jason:
          This looks like interesting legal scholarship (from reading some of the preview cases)
          Beyond Legal Reasoning: a Critique of Pure Lawyering. Jeffrey Lipshaw

        • from the book blurb: “…”pure lawyering” of traditional legal education is agnostic to either truth or moral value of outcomes. He demonstrates pure lawyering’s potential both for illusions of certainty and cynical instrumentalism…”

          Looks about right. Do you have a link to the cases you mention? I couldn’t find anything more extensive than the preface.

        • The NHST paradigm gives an excuse to say “see I didn’t waste your money we rejected the null X times in this paper, so we found out X facts about the world”. Of course, this isn’t anything like true, but it does then help justify giving the next grant so the researchers can spend more money…

          Or, put more rudely: It is a sham. Huge amounts of your tax/donation money is being wasted by people doing the sham, and your tuition money is paying for the brightest people in your society to be taught to do it. Further, this has been going on for decades. I really don’t think this is a hysterical point of view, it is just true.

      • +1 to @Daniel Lakens:

        >>> And then I have to decide whether I will do a follow-up study, or abandon a line of research and focus on something else. This is a dichotomous choice, <<<

        This is the crucial point. I get frustrated when people don't acknowledge that there's often a dichotomous choice at the end that you just cannot wish away. The dichotomy is no abstract approximation or false simplification made by silly conventional researchers but it is the reality we live in.

        An alternative response that is equally frustrating is "Just use decision theory and actual utility functions etc. to map the posterior into an actual decision" but 99% of most blog posts nor papers will go into the messy details of how to get to a decision and just conveniently stop at a posterior.

        Another mystifying response I get is of the sort "The practitioner in the field (e.g. the clinician) is best place to convert our academic model's continuous output into a dichotomous decision" Well, sorry, but aren't you just avoiding the hardest part of the job? What use is your posterior to a clinician or ER surgeon; do we expect him to process the posterior into a dichotomous decision in his brain intuitively?

        • Rahul,

          I’m not sure you read this post correctly. The point is that we don’t care about dichotomous decisions between one stupid hypothesis and one plausible one.

          If you have a 95% credible interval of r=[.01,.05], you can easily make a decision here: It’s small; too small to realistically care about.

          It’s still a decision, it’s just irrelevant to the decision of “is there a r=0 or r != 0”; bayesians would see that and say “it’s probably positive, but too small to care about, move on.”

          I think some comments are misinterpreting the ‘dichotomous decision’ thing. It’s not that one doesn’t /make/ a decision, it’s that the decision isn’t between some null model and an alternative model; it’s about what one should do after seeing the data, which the posterior informs.
          A bayesian might see r=[-.03,.03] and say “well, we have no idea what the sign is, but it’s irrelevant really because it’s so small noone should care.”; a NHSTer would say “NOOOOO it’s not different from 0!”

          Both make decisions, the bayesian doesn’t treat 0 as some special strawman to beat down. 0 is just another possible value on the real line. One doesn’t need to use some formal decision analysis with loss or utility functions, blah blah blah; for applied cases that might make sense, for basic research maybe not. You just make a decision based on the posterior given your data, rather than ruling out a hypothesis you don’t care about anyway.

        • Right, but “you just make a decision based on the posterior given your data” is really just an informal version of the formal decision analysis. That is, it’s based on what you expect to be the benefits given the knowledge about the probable range of value parameters.

          If I do an NHST and prove to you beyond a doubt that “drug X is better than placebo” the implied next step is “use/approve/Rx the drug”, but it is still a completely and totally valid decision to say “I will not use the drug” because after your Bayesian analysis you see that the *amount by which it is better than placebo is meaninglessly small, and the dollar/side-effect costs are not zero*

          Whether a formal function is programmed into a computer and a formal decision analysis is performed using a posterior sample out of Stan, or you just look at the 95% HPD interval for the parameter and say “gee it’s obvious from this information that I should do X” you still *base your decision on the magnitudes* not on the answer to a “yes/no” hypothesis test.

        • And, elaborating on the drug analogy of the second paragraph: The clinical trial “drug X is better than placebo” interprets “better than” in terms of a mean, a proportion, etc. Do it may be better than placebo for some people and worse for others.

          (That being said, there are some cases where one might want to make a yes/no decision as a rule that will give you a certain outcome in the long run — e.g., some industrial situations: a rule “shut down the process and tune it if the number of defective parts is greater than x” could be good as a long term strategy — as long as letting some defective parts slip through does not have serious consequences.)

        • DL: If I do an NHST and prove to you beyond a doubt that “drug X is better than placebo” the implied next step is “use/approve/Rx the drug”, but it is still a completely and totally valid decision to say “I will not use the drug” because after your Bayesian analysis you see that the *amount by which it is better than placebo is meaninglessly small, and the dollar/side-effect costs are not zero…”

          GS: But, of course, after looking at 25 individuals using an SSD I can say, “It worked really well – far better than placebo – in 3 individuals.” Not “meaninglessly small” to those three. Now, one might notice such individual dramatic cures in a group design experiment, but only because the illness stays around and, thus, there is a baseline (of illness) upon which to make an inference about individuals. But the reasoning is still affirming the consequent and it is still implicitly an SSD. In order to be surer in these circumstances, one would have to do reversals – i.e., do an SSD in those individuals. Somehow this possibility (using SSDs) never seems to come up. Go figure.

        • In many cases, the researcher using Bayesian methods isn’t the right place to make the decision. So publishing a posterior distribution of the parameter shouldn’t be considered annoying that “99% of most blog posts nor papers will go into the messy details of how to get to a decision and just conveniently stop at a posterior”

          if I’m trying to find out how much I know about how well people in the third world will do if you feed them de-worming pills… I should publish a posterior distribution over say the QALY loss associated. But it would be wrong to then go ahead and say “the world should value a QALY loss of a West African Child at $X and the cost of the pill program is $Y and so expected return on this project is $Z” because the people who need to think hard about the various values are different people..

          Now, that being said, if you’re working on a policy relevant problem, I think it’d be great to publish a posterior sample out of Stan and an R script which, if you provide it with a sequence of Value_i functions for each of several possible choices will calculate a posterior expected Value for each possible choice… That’d be AWESOME. Then people could actually sit down and start talking about the appropriate evaluation of the value of the choices they are making, and have them be informed by actual posterior distributions from academic inference.

        • >>>In many cases, the researcher using Bayesian methods isn’t the right place to make the decision. <<<

          Who is better placed to make the decision then? Who's the right consumer for your posterior then who can process it into a decision?

  2. I agree with a lot that Emmanuel wrote but I think he misplaces what should always be covered (at some time in thoughtful scientific analyses) regardless of whether Bayesian or frequentest techniques are used. (His comment ” I can easily imagine myself discussing a clinical subject ; possibly a biological one” I take to mean that he can rely on past work in that type of current work.)

    If it is to be science, at some point all issues need to be thoroughly considered. Most of my published work involved frequentest techniques and I did try very hard to address these things.

    I now think the same can be done with a Bayesian approach with far less work and drastically less restrictions – which one can expect will make for better consideration of http://statmodeling.stat.columbia.edu/2017/03/08/applying-statistics-science-will-likely-remain-unreasonably-difficult-life-time-no-intention-changing-careers/

    On the other hand, I think it is a bit silly to think that this can be put into a n page ASA guidance document that folks of all backgrounds can easily follow or teach others with.

    • “I think it is a bit silly to think that this can be put into a n page ASA guidance document that folks of all backgrounds can easily follow or teach others with.”

      +1

  3. I’m afraid I missed the earlier discussion (and cannot find a link to it), but surely it is not suggested that Bayesian methods are the only alternative to NHST? Information theoretic approaches fit the bill, right?

    • Joseph:

      1. The link to the earlier discussion is right there in the post above.

      2. I agree that Bayes is not the only alternative to NHST. In fact, some people do Bayesian NHST, which I really hate! You can think of a Venn diagram where Bayes and NHST are overlapping circles.

        • Rahul:

          I don’t really know what is meant by “information theoretic approaches” but I agree with the general point that there are non-Bayesian ways of doing statistics without NHST.

        • I suspect “information theoretic approaches” are along the lines of a combination of maximum entropy and minimum description length (coding theory) approaches.

          https://en.wikipedia.org/wiki/Minimum_description_length

          I think there are Bayesian aspects to MDL type procedures, so again, probably overlap.

          One thing that is known however, is, either “information theoretic approaches” are a subset of Bayesian approaches, or they violate Cox’s Theorem, in which case they don’t agree with Boolean logic in the limit of infinitely precise information…. and you’d have to give me a strong argument why that would be a good thing.

        • I understand “information theoretic approaches” to include things like Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and other **IC’s that are typically used for model selection. They all have their dubious aspects, and the question of making the decision of interest is not automatically solved by choosing a model.

  4. What is the scope of inference if we embrace uncertainty and accept variation?
    For example, I conduct a clinical trial and get 85% posterior probability that active is better than placebo.

    What does that 85% mean if we embrace uncertainty and accept variation?

    • No no no no no no no no!!!

      You do a study and find “the probability that the outcome under active is X amount better than placebo is p(X)” you then do a decision analysis on net value that is received by the patient if you treat with active and the actual “amount better” is X, and choose active if integrate(Value(X) * p(X) dX) is greater than zero.

      • (I’m not sure I see why it has to be “No no no”, especially since I agree with your proposition in principle.)

        The question is about the scope of inference if we embrace variation and accept uncertainty, which you don’t avoid by any decision analysis.

        In any event, the relevant currencies defining ‘value’ for your decision analysis depends on the stakeholder, so it isn’t easy to generalize the results of the decision analysis to arbitrary populations (e.g. patients, patients’ families, healthcare providers, legislators, advocacy groups, insurance companies, etc.). This key point makes it hard to discuss results of a particular trial in, say, JAMA in terms of decision analysis.

        • I see. I get it, and I find that investigators are very open to that ‘revision’.

          Device/drug company sponsors are a different story, especially if they are already thinking of the sound bite that will go into their next TV commercial.

  5. If we’re being honest, the math/stats is also harder. I’m working through BDA3 right now, and it’s a lot less forgiving than “traditional statistics.” Now, that’s probably a feature. The ease of traditional statistics, regressions, and p-values, comes at the cost of a much higher chance of generation nonsense or silly results.

    As excited as I am to learn all this, figuring out how to frame my problems in a Bayesian framework, than writing additional code in Stan, is an order of magnitude harder than work I did in the past where you throw some data into Stata and get some fast results. Again, I agree this is a feature, but it is harder for those of us unlike our author with IQs below 150 :)

    • “IQ below 150” is an unnecessarily dichotomous way of thinking and not very useful.

      You would be much better off thinking about that as a continuous function mapping perceived hardness on to IQ.

    • Natasha:

      Your comment reminded me that “traditional statistics” can mean very different things – for instance in Great Britain it seemed to primarily mean knowing how to fit linear and generalised linear models rather than anything to do with the semantics and or mathematics of say data generating models (likelihoods).

      For instance, when I mention that generalised linear models were based on likelihoods to Doug Altman in 2001 he objected with a “No, they are not”. Showing him a chapter from McCullagh and Nelder resolved that, but it was clear he had no interested in thinking/working that through (at the time) and that seemed to be the general sentiment of most of the practicing statistics I met there. (My adviser also told me this was his experience – “they are very good at running software to fit standard models”.)

      So the difficulty will depend on the background your “traditional statistics” provides you.

      (Have you thought of mixing in McElreath’s book in your learning?)

    • A lot of the math is harder purely due to the terrible formulas for the probability densities. Conceptually, the math isn’t particularly hard. Most of it is just algebra. At this point, it’s not like I sit down and try to integrate probability densities; that’s why stan exists, so why bother doing it by hand?

      I would look for the forest, not the trees. The concepts of bayes, imo, are much easier to understand than the math underlying frequentist stuff. You every try to analytically derive the sampling distribution of multilevel model parameters? You ever try to analytically derive even things like welch’s t test, or more complicated anova designs? If you did, you’d see the math is still complicated, we just ignore it because our tools do the hardwork for us at this point :p.

      • That’s a fair point. Although, and this is my perception so far as a newbie, but there is a certain level of comprehension of the entire model and data space required to set informative priors. Which, again, I guess is a feature, since it means you need to understand exactly what it is you’re modeling. But it takes some getting used to for me.

        I’ll keep at it.

      • Right, I think getting bogged down in the complexity of high dimensional integration etc is not the way to go. For learning, start by thinking about the problem along the lines of pure determinism. Can you write a formula that would tell you how, deterministically, feeding snakes a diet of X would change their molting rate F…. or whatever it is that you’re studying… think about the *subject matter* content and pretend you have an oracular source that you can ask “what number should I plug in for the coefficient q in my model”… and then just write it down as q and wait until later.

        Once you have a conceptual mathematical model of the *subject matter* move on to the fact that some things aren’t known perfectly due to measurement error, lack of perfect modeling, etc. “fuzz out” your predictions using choices of probability distributions that represent the way in which you *really are* fuzzy in your prediction ability or measurement ability. For example if I know that something must be positive and can sometimes go near zero… I need to build in the fact. I probably shouldn’t use

        Data ~ normal(Prediction,Spread)

        because the normal will have a fair amount of probability below zero when Prediction is near zero… that kind of thing.

        Finally, program your model into Stan and run it, and then check the output through a variety of “sanity checks” as discussed elsewhere.

        Operationally, this is the way to get started, just as for learning linear regression getting a bunch of example datasets and project descriptions and running a bunch of “lm” commands in R is probably a lot better than learning about the properties of the Singular Value Decomposition of a non-square matrix (the means by which one implements the lm function).

      • Stephen:

        Agree and I would go further that calculus and algebra can be skipped over while still getting to realistic but not too high dimensional Bayesian analysis- https://phaneron0.wordpress.com/2012/11/23/two-stage-quincunx-2/

        Although you then encounter – as Natasha points out below – difficulties of “comprehension of the entire model and data space” which hits newbies more immediately and abruptly than slowly working through the usual calculus and algebra based introductions. I previously put this as problems with appreciating abstract representations, working with them and then relating that to empirical questions.

    • Natasha, I’d replace “it is harder for those of us unlike our author with IQs below 150” with “it is harder for those of us, who, unlike our author, have not had early exposure to the kinds of thinking that are involved.”

      I think there are two conclusions to be drawn from this:

      1. As several who have responded to your comment have pointed out: Newbies need to be encouraged to keep at it; it’s like learning anything new and different (e.g., learning to play a musical instrument). Things that initially are difficult become easier as you get used to them.

      2. Proponents of Bayesian methods need to watch out for saying things that sound like “It’s easier than you think” to people who have not yet had adequate experience with the types of thinking that are involved.

  6. These are different:
    1. Is my hypothesis true?
    2. Should I take action X?

    I say this because many commenters seem to conflate these: we need to do a hypothesis test because ultimately we need to either do a specific thing or not do it.

    Although one can perform a full decision analysis to make a yes/no decision that incorporates uncertainties — see this article Andrew and I (and others) wrote many years ago for example — even without that analysis a person with reasonable statistical savvy is usually better off with an estimate + uncertainty than with a yes/no on statistical significance. An effect can be statistically significant but not practically significant, if you have a big enough sample; and it can be practically significant but not big enough to be worth paying for.

    I agree with many commenters that full decision analyses are rarely conducted, and that this is not likely to change anytime soon: such analyses are often hard to do, as was the case with the paper I mentioned above. But I think people are much better off working with something like “If we take action X, it will cost $D +/- d and have effect Z +/- z” than “if we take action X, it will cost $D +/- d and have a statistically significant effect.”

    • I’d much rather that people spend the time and effort to take the problem to its logical conclusion which would be a full decision analysis. If they publish fewer papers, so be it.

      The alternative, of maximizing output by just side-stepping the hardest part of the analysis isn’t helping much, I feel.

      • I dont think you are right on this one. These are several practical reasons why the statistical inference and decision analysis generally take place separately. I would guess most ‘estimation’ studies generate results that will be relevant to many different decisions. Similarly, most serious real-world decision analyses must draw on a range of different evidence — there is no single study that estimates all the parameters (or if there was, it would actually be a collection of sub-studies). Requiring a one-stop-shop analysis would be hugely inefficient.

        This is not at all to defend p-value heuristics, but highlight the critical challenge, which (as I see it) is to define the appropriate statistics for generalizing from an individual study to all the decisions that it might relate to.

        • “…define the appropriate statistics for generalizing from an individual study to all the decisions that it might relate to.”

          GS: Isn’t that suggesting that generality of a finding is discernable in an individual study?

        • Sorry — if it does suggest that it is not intended.

          The question is about how we can, as succinctly as possible, summarise the findings of one analysis so they can be used for a different one. If the decision analysis is not yet specified, reporting anything less than the full joint posterior runs a risk of losing some important information. Yet does anyone really report this? Maybe a downloadable csv file with posterior draws? That doesn’t seem to be common anyway. At the other end, reporting focussed on p-values removes a lot of the information, and allows readers to confuse p<0.05 for a valid decision rule.

        • If I ran a journal, I’d require a Bayesian analysis of any data, a SQLite3 file with the data, whatever code was used to do the analysis, and a CSV file with the posterior draws as a condition of even getting a review.

        • Anonymized data if necessary of course. If the data were very sensitive, I’d even accept a program that outputs fake data which has been tested to conform to many many computational frequentist tests to show that the fake data was “as good as” the real data in some high dimensional distributional sense.

          I think I’d still require an encrypted SQLite3 file containing the real data and a contractual agreement to supply the encryption key if the IRB approved a specific request… With a hefty contractual fine for failure to supply it.

          I don’t think we should fool around.

        • > discernable in an individual study?
          Agree – that is a big (and hopefully decreasingly common) mistake.

          In the discussions here it might have been implicit that generalizing from an individual study might include how to CONTRAST it with other studies and then generalize from all of them together as appropriate.

      • Except in academia, publishing papers is rarely the goal and indeed is rarely involved at all.

        Right now I’m working with an electric utility in California that is trying to make decision about things like how much more infrastructure they need in order to charge electric vehicles (EVs), and how to design rate plans that will incentivize desired charging behavior such as charging at off-peak times. Writing a paper has nothing to do with it.

        From an academic perspective this would be a great case study for a full decision analysis. As a practical matter, though, the complexity would be mind-boggling. A key problem with a full decision analysis is that you have to specify things like utility functions that are often poorly understood by the people who end up making the decisions. There may not even be a well-defined utility function in a meaningful sense: decision-making is often distributed (e.g. among utility leadership, the Public Utilities Commision, and other entities) and the makeup and mindset of those entities can change with time, on a timescale comparable to the rate that the decision analysis can be performed.

        To repeat a point Andrew has made, though, I think they might find a lot of value in figuring out what they would want to include in such an analysis, even if they aren’t going to do it: this would help figure out what issues are the most important ones. For instance, how big a factor is, say, the cost of land for additional electric substations, compared to the cost of the equipment in those substations, and how does this vary by location throughout the service area? The number of EVs that will be on the road is uncertain by a factor of at least 2, and the kWh each of those vehicles will charge per day is uncertain by a large factor as well; how important are these uncertainties together and separately (when it comes to the peak power the number of vehicles is most important; when it comes to total energy it’s the number times the mean kWh)…and on and on and on.

        I am a fan of doing a full decision analysis when feasible, but I also understand why they are usually not performed.

        • Phil:

          Indeed. My comment was mainly addressing academic publishing. Most of the work I can read is that. The non academic work is probably very interesting and possibly more useful but that’s so often not publicly available.

        • There’s that Eisenhower quote (possibly apocryphal) about how plans are useless but planning is essential. I think some here are seeing an analogy with formal decision theory. The decision reached my the formal analysis may not be useful because you couldn’t precisely specify all the possible scenarios for each choice, their utilities and probabilities, but at least by trying to specify them you might get an idea of which bits are relatively more or less important and which bits are more or less known/quantifiable.

          And a story from Persi Diaconis:


          To be honest, the academic discussion doesn’t shed much light on the practical problem. Here’s an illustration: Some years ago I was trying to decide whether or not to move to Harvard from Stanford. I had bored my friends silly with endless discussion. Finally, one of them said, “You’re one of our leading decision theorists. Maybe you should make a list of the costs and benefits and try to roughly calculate your expected utility.” Without thinking, I blurted out, “Come on, Sandy, this is serious.”

        • Great story, Kit, I will repeat it.

          And, by the way: Martha, I always appreciate your comments. I say this because, although they sometimes prompt responses, they don’t always do so and on this thread it looks like they have been doing so. Perhaps I should get in the habit of just doing the “+1” thing but I feel kinda silly doing that. It’s a pity we can’t just click a “+1” button or something.

  7. I was expecting Andrew to object to being characterized as advocating “a discussion of the posterior probabilities of these models.” I usually hear him say exactly the opposite—we even have a slide to that effect in our courses (emphasizing that we’re not interested in rejecting models or even the probability of them being true).

    We are, on the other hand, very interested in comparing their estimates and their predictions. Andrew states as much on the very first page of Bayesian Data Analysis!

  8. I do understand some of the issues here. Many would read on this blog the issues related to the dreaded “researcher degrees of freedom” and conclude that what is being proposed is that analysts always follow a predefined script, with absolutely no innovation of insight involved. Of course if this is the case, it would seem to contradict the whole issue of “model building” and context driven model building discussions, which are discussed on this blog as well.

  9. I think you (Andrew) are going to have to redo the Stats 101 curriculum before the switch you advocate can occur, except among certain quantitatively literate folks with a relatively deep interest in this subject. As it is, statisticians have spent 70 years trying to drill in the logic of NHST, and most researchers expect their statistical expeditions or collaborations to yield yes/no, real/not-real dichotomous answers.
    When you answer with, “no, but can’t you see that I’ve quantified the uncertainty (in the posterior)?” they say, “yes, but what does that mean? (i.e. how can I shoehorn this into yes/no-real/not-real? Which is ultimately all about, how can I convince the “skeptical” reviewer which of my results are Gospel Truth and which are not?)”.
    The whole thing is hopeless until Stats 101 gets a total makeover, or just goes away…

    • Chris:

      I agree 100%. I used to teach intro stat every year, but I’ve pretty much avoided doing it for 15 years because I didn’t feel comfortable with what I was teaching. My colleagues and I do have a plan for constructing a new intro course but we still haven’t put it together.

  10. Wow ! I didn’t think that my side remarks were worth such a discussion. And I think I’ve been unclear : : I had two main ideas

    1) All the parts of the statistical process (from initial bibliography to results discussion) needs to be discussed.

    2) Any not-totally-certain results (i. e. not a theorem) needs (something amounting to) a decision analysis.

    In order to (try and) be clearer, I’ll just clarify the first point. I might discuss the second one if|when opportunity arises.

    My point was (is) that what we think as a (“the” ?) “statistical analysis” is only a part of the solution to the problem of understanding the results of an experiment. In all cases, these “statistical” results have to be re-framed in terms of the subject matter. And there is a lot more to discuss that accepting or rejecting an hypothesis (in the case of a frequentist NHST), stating (an approximation of) a confidence interval (frequentist estimation), stating (an approximation of) a posterior distribution (Bayesian estimation) or (an approximation of) a Bayes factor (or analogous comparisons, such as choice of a model via the quality of the cross validations…).

    Any of these “results” is the result of a chain of inferences and hypotheses ; each link in this chain has to be discussed in order to accept (or not) the “result”. For example, the choice of a given shape for a likelihood, or the choice of a transformation of variables, deserves discussion ; similarly, the definition of an object population and the choice of a sampling method need to be discussed.

    These discussions cannot be made in abstracto ; their relative (possible) influence on “the results” are an important part of the assessment of these results.

    For example, when analyzing a biological experiment conducted on a well-defined animal species, the possible influence of the sampling method used for treatment allocation has probably a much weaker effect on “the results” that the choice of the dependent and independent variables, their transformations, the shape of the likelihood and (in the Bayesian cases) the priors and their justification.

    A contrario, in a descriptive analysis of a case series of patients presenting some rare disease in a tertiary care hospital, the (non-)sampling and the (lack of) control of treatment allocation is known to have a major influence on “the results” and dwarfs the possible influence of analysis technique variations.

    All those discussions require collaboration between the statistician and the subject matter experts ; this collaboration is, in fact, a decision analysis, relating the choices made before and during the experiment/observation and its analysis to the conclusions that can be drawn from “the results” of this experiment/observation and its analysis.

    In this context, learning to present “the results” as a set of possible models and estimation of their parameters is, in fact, but a part of the work to do. I agree with Andrew Gelman on the fact that this presentation is not *too* different from “classical” analyses. And subject matter specialists having a real understanding of the meaning and purpose of a statistical analysis won’t have much problem accepting this presentation.

    The problem exist, however, with subject matter specialists (and other “clients”, such as sponsors) expecting (demanding ?) “sharp” answers (for example : “the studied factor has (resp. has not) an effect on the main judgment criterion”) : “selling” our discussions to those people *is* hard.

    (But not for technical reasons : the issues here are of organizational, sociological, political and, ultimately, philosophical nature. We may be erring a bit outside the reference domain of this blog…)

    • > this collaboration is, in fact, a decision analysis
      Agree – there is an avoidable decision to stick with a current interpretation of something given the evaluation that there is currently not an economic way to revise it to something hopefully better. That should involve all who could inform that decision – for instance if someone is aware of a very similar study or potential source of information – a decision to stick may be a very poor decision.

      I do think those sorts of things should be discussed on this blog and for instance the link I gave above was on a post mostly about “having to have much more that adequate knowledge of statistical methods and modelling but also a rather advanced grasp of what science is or should be as well as what to make of the output of statistical modelling (e.g. posteriors) beyond the mathematical summaries of the modelling”.

  11. Andrew,

    You write “And I’ve recommended that people switch to a Bayesian approach, “embracing variation and accepting uncertainty,” as demonstrated (I hope) in my published applied work.”

    I have two questions regarding this:

    1. What would you say is the paper that you’re most proud of from a technical Bayesian perspective? Or alternatively, what’s a paper where a new comer to Bayesian statistics could learn the most from?

    2. Is there a paper or example where one could see the same problem treated from a Bayesian and a Frequentist perspective of both approaches?

    This would for me at least be very helpful to understand why you prefer Bayesian to Frequentist methods. Apologies if you have answered this question somewhere else before. Thanks!

    Cheers,

    Toby

  12. It may be of interest to some here to note that, philosophically, the Bayesian view is contextualistic, while frequentist views are, philosophically, mechanistic.

    https://en.wikipedia.org/wiki/Contextualism

    https://en.wikipedia.org/wiki/Mechanism_(philosophy)

    The following book review is written specifically to discuss the “philosophical core” of behavior analysis and to contrast it with other approaches to psychology, but it contains a good general description of contextualism and mechanism (and two other “world hypotheses” that Pepper judged to be “relatively adequate”):

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1338844/pdf/jeabehav00033-0099.pdf

Leave a Reply to Martha (Smith) Cancel reply

Your email address will not be published. Required fields are marked *