I love this paper. Here’s the abstract (yes, it’s too long, I know):

Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)–which are generally considered as unrelated statistical paradigms–can be particularly effective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data y.rep and replicated parameters theta.rep follows a long tradition of generalizations in Bayesian theory.

On the theoretical level, we present a predictive Bayesian formulation of goodness-of-fit testing, distinguishing between p-values (posterior probabilities that specified antisymmetric discrepancy measures will exceed 0) and u-values (data summaries with uniform sampling distributions). We explain that p-values, unlike u-values, are Bayesian probability statements in that they condition on observed data.

Having reviewed the general theoretical framework, we discuss the implications for statistical graphics and exploratory data analysis, with the goal being to unify exploratory data analysis with more formal statistical methods based on probability models. We interpret various graphical displays as posterior predictive checks and discuss how Bayesian inference can be used to determine reference distributions.

We conclude with a discussion of the implications for practical Bayesian inference. In particular, we anticipate that Bayesian software can be generalized to draw simulations of replicated data and parameters from their posterior predictive distribution, and these can in turn be used to calibrate EDA graphs.

Also this paper.

You haven't told us why you love it, and I for one would like to know!

It seems to me that in your paper the connection between EDA and Bayesian inference rests on your claim that "EDA is based on models". Is it really? You present two methods often used in EDA that incorporate models (two-way plot fitting, and hanging rootograms) but are these representative of EDA as a whole? What is the model behind a scatterplot? Behind a histogram?

I like the definition of EDA at NIST: "EDA is an approach to data analysis that postpones the usual assumptions about what kind of model the data follow with the more direct approach of allowing the data itself to reveal its underlying structure and model."