David Kaplan writes:
I came across your paper “Understanding Posterior Predictive P-values”, and I have a question regarding your statement “If a posterior predictive p-value is 0.4, say, that means that, if we believe the model, we think there is a 40% chance that tomorrow’s value of T(y_rep) will exceed today’s T(y).” This is perfectly understandable to me and represents the idea of calibration. However, I am unsure how this relates to statements about fit. If T is the LR chi-square or Pearson chi-square, then your statement that there is a 40% chance that tomorrows value exceeds today’s value indicates bad fit, I think. Yet, some literature indicates that high p-values suggest good fit. Could you clarify this?
I think that “fit” depends on the question being asked. In this case, I’d say the model fits for this particular purpose, even though it might not fit for other purposes.
And here’s the abstract of the paper:
Posterior predictive p-values do not in general have uniform distributions under the null hypothesis (except in the special case of pivotal test statistics) but instead tend to have distributions more concentrated near 0.5. From different perspectives, such nonuniform distributions have been portrayed as desirable (as reflecting an ability of vague prior distributions to nonetheless yield accurate posterior predictions) or undesirable (as making it more difficult to reject a false model). We explore this tension through two simple normal-distribution examples. In one example, the low power of the posterior predictive check is desirable from a statistical perspective; in the other, the posterior predictive check seems inappropriate. Our conclusion is that the relevance of the p-value depends on the applied context, a point which (ironically) can be seen even in these two toy examples.