Tim Bock writes:
I understood how to address weights in statistical tests by reading Lu and Gelman (2003). Thanks.
You may be disappointed to know that this knowledge allowed me to write software, which has been used to compute many billions of p-values. When I read your posts and papers on forking paths, I always find myself in agreement. But, I can’t figure out how they apply to commercial survey research. Sure, occasionally commercial research involves modeling and hierarchical Bayes can work out, but nearly all commercial market research involves the production of large numbers of tables, with statistical tests being used to help researchers work out which numbers on which tables are worth thinking about. Obviously, lots of false positives can occur, and a researcher can try and protect themselves by, for example:
1. Stating prior beliefs relating to important hypotheses prior to looking at the data.
2. Skepticism/the smell test.
3. Trying to corroborate unexpected results using other data sources.
4. Looking for alternative explanations for interesting results (e.g., questionnaire wording effects, fieldwork errors).
5. Applying multiple comparison corrections (e.g., FDR).
In Gelman, Hill, and Yajima (2013), you wrote “the problem of multiple comparisons can disappear entirely when viewed from a hierarchical Bayesian perspective.” I really like the logic of the paper, and get how I can apply it if I am building a model. But, how can I get rid of frequentist p-values as a tool for sifting through thousands of tables to find the interesting ones?
Many a professor has looked at commercial research and criticized the process, suggesting that research should be theory led, and it is invalid to scan through a lot of tables. But, such a response misses the point of commercial research, which is an inductive process by-and-large. To say that one must have a hypothesis going in, is to miss the point of commercial research.
What is the righteous Bayesian solution to the problem? Hopefully this email can inspire a blog post that can help an industry.
First, change “thousands of tables” to “several pages of graphs.” See my MRP paper with Yair for examples of how to present many inferences in a structured way.
Second, change “p-values . . . sifting through” to “multilevel modeling.” The key idea is that a “table” has structure; structure represents information; and this information can and should be used to guide your inferences. Again, that paper with Yair has examples.
Third, model interactions without aiming for certainty. In this article I discuss the connection between varying treatment effects and the crisis of unreplicable research.