He’s looking for a textbook that explains Bayesian methods for non-parametric tests

Brandon Vaughan writes:

I am in the market for a textbook that explains Bayesian methods for non-parametric tests. My experience with Bayesian statistics thus far comes from John Krushke’s Doing Bayesian Data Analysis, but this book excludes non-parametric statistics. I do see that your text, Bayesian Data Analysis 3e, covers non-parametric statistics, however, does it contain instructions on how to conduct Bayesian alternatives to things such as the Mann-Whitney U Test, Kruskal-Wallis, etc specifically? I’m fairly statistics naive, being mostly self-taught, and am admittedly only recently aware of what non-parametric tests are. I’m currently a master’s student, and have taken a great liking to statistics, especially Bayesian methods, and am keen to learn more.

My reply: I don’t think you need to do Mann-Whitney U Test, Kruskal-Wallis, etc. In BDA we briefly explain why we don’t do the Wilcoxon (see post here), and I think the same reasoning goes for these other tests. Just model what you want to model directly, and forget about the whole null hypothesis significance testing thing.

23 thoughts on “He’s looking for a textbook that explains Bayesian methods for non-parametric tests

    • Testing is a form of estimating within a model. (People generally don’t like to think of it this way though because it elucidates the simplicity of testing against a null hypothesis.)

  1. I’m trying to understand what “just model what you want to model directly” means.

    e.g. Say I’m testing a de-foaming agent in lab experiments & the mean foam quantity of the runs with defoamer is slightly better (lesser) than the runs without.

    Conventionally, one uses some sort of NHST to report whether the difference is significant.

    In the “just model directly” approach how does one proceed?

    • model{

      // priors for mean1,spread1,mean2,spread2 go here

      foamvoldefoam ~ mydist(mean1,spread1);
      foamvolcontr ~ mydist2(mean2,spread2);

      }

      generated quantities{

      real diffofmeans;

      pctred <- (mean2-mean1) / mean2; // estimated percent reduction in means

      }

        • Note in particular, that you’re free to assume somehow that the distributions dist1 and dist2 are different, based on your knowledge of how the defoamer works. You’re also free to use different types of parameters rather than a simple 2 parameter mean, dispersion. Also, you can model the mean1 and the mean2 in terms of covariates, such as concentration of the defoamer, temperature of the different experiments, etc etc etc.

        • +1. Embracing modeling gives you way more flexibility than canned tests, even for simple things like this.

        • It’s so hard to guess the priors / distribution / spread!

          It’s like I’m facing an explosion in my analyst degrees of freedom.

        • Typically it’s not really that hard. You do two things:

          1) rescale your variables by order of magnitude estimates (don’t measure volume, measure volume as a fraction of say the beaker…)

          2) Figure out what you logically know, such as volume is a positive number

          3) Pick a distribution for the data that mimics some of the features you need… such as in your case the gamma distribution is positive.

          4) Pick some values for the parameters that produce distributions under (3) that have the right order of magnitude. (ie. we’ve rescaled so typical volume values are about 1, so it’s no good if you have priors on your parameters that make the gamma distribution have typical values of 10^5)

          5) Based on (4) choose priors that cover a reasonable size range around the values for the priors you chose in 4, allowing the machinery to “search” within a range that isn’t a waste of time.

          Run the model, get the results, and check to see that they make sense.

          if you are worried a lot about sensitivity of the results to the model… re-do the analysis with a different class of distributions, widen your prior range, etc and then see if the results change. You’ll find that if you have a straightforward problem like this, and you alter the model form, it rarely has a big effect on the results.

        • Also, in (3) it makes good sense to use a maximum entropy distribution, and in (5) you can often make your priors wider than you really think they need to be and rely on your data to constrain things, provided you don’t have a tiny dataset (ie. maybe 2 observations etc).

  2. As far as I can tell, the advantage of non-parametric statistics is that they allow us to extract information from the data in (general) ignorance of the underlying generating model. My previous forays into Bayesian modelling have suggested that Bayesian models specialise in the opposite, namely being as explicit as possible about the underlying data generating model. Can Bayesian techniques be used in cases where one does not have any good information about what sorts of data generating processes are at play?

    • Yes. There’s lots of stuff about Dirichlet processes for modeling unknown “generating processes”. But I’d caution against using these widely. It’s explicitly interesting where you’re trying to fit a frequency distribution, but as Bayesian distributions are not necessarily forced to be frequency distributions, it can be meaningful to use parametric distributions in ways that Frequentist philosophy wouldn’t agree with. For example, if you assume normality and calculate a Bayesian result, the Bayesian result has a clear interpretation even if the underlying frequencies in your data did not have a normal shape.

        • It’s been discussed widely here and at http://www.bayesianphilosophy.com and by Jaynes in Probability Theory the Logic of Science. In a Bayesian setting, the density measures the plausibility that a particular value will fall in the neighborhood of the point in question rather than the frequency with which data from a long sequence will fall near that point.

          So, in this case, the interpretation of using the normal distribution to describe a particular data point is that you believe that THAT SPECIFIC data point will fall near the mean value of the normal to within an error whose size is a small multiple of the standard deviation (in other words, the value will be in the high probability region of that normal distribution). Since that specific data point has only one realization, it’s meaningless to discuss the frequency with which THAT POINT does anything.

          If you’re specifically imagining a repeatable experiment and want to model the frequency with which data is produced in a certain region, then a Dirichlet process could be a useful tool. For example, in a manufacturing setting, or a game of chance, or a noisy measurement device with a stable noise distribution where you’re interested in inferring the noise distribution (not the measurement!).

  3. Rank tests are rough, and sometimes rough is good.
    I have seen many studies as statistical advisor in which there was a rather ridiculous gap between the complexity of the situation and the data quality, and I don’t think that more modelling than required for the rank test would have helped my clients in any way. They could better do with good visualisation and, at best, the simplest and roughest of tests, than with piling distribution on distribution as the Bayesians like to do.

    • Christian:

      Follow the linked post. What I recommend as an alternative to rank tests is not to “pile distribution on distribution” but rather to take the data, replace them by their ranks, and run regressions (or t-tests, in the simple case of a comparison of groups with no other predictors). I think this approach is simpler than the classical approach. Instead of having a different named test for each problem, we separate what we’re doing into 2 steps: First, replacing the data by their ranks (trading off efficiency for robustness by throwing away some information); second, doing the comparison we would always be doing.

      • Yes, I know the linked post and commented there at the time. The “piling distribution on distribution” is about replacing testing by Bayes and of course only applies to anyone to the extent that they recommend this (which I think is sensible in some situations but not all).
        There’s a comment above by Allessio Benavoli about using the Dirichlet process to do Bayesian nonparametric tests. I wonder why, if somebody wants to do tests like that, they need to introduce Bayes and Dirichlet in the first place.

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *