## Avoiding only the shadow knowing the motivating problem of a post.

Given I am starting to make some posts to this blog (again) I was pleased to run across a youtube of Xiao-Li Meng being interviewed on the same topic by Suzanne Smith the Director of the Center for Writing and Communicating Ideas.

One thing I picked up was to make the problem being addressed in a any communication very clear as there should be a motivating problem – the challenges of problem recognising and problem defining should not be over looked. The other thing was that the motivating problem should be located in the sub-field(s) of statistics that addresses such problems.

The second is easier as my motivating problems mostly involve ways to better grasp insight(s) from theoretical statistics in order to better apply statistics in applications – so the sub-fields are theory and application, going primarily from theory to application. This largely involves trying to find metaphors or even better – various ways to re-represent theory in terms that are more suggestive of how/why it works or hopes to work. Vaguely (and overly hopeful), to try and get diagrammatic representations that facilitate a moving picture of how/why it works or hopes to. To see representing (modelling) at work.

At a very general level, my current sense is that statistics is best viewed as being primarily about conjecturing, assessing, and adopting idealised representations of reality, predominantly using probability generating models for both parameters and data. Now we want less wrong representations of reality and hopefully we can get them. This can only be a hope as we never have direct access to reality to ever in fact know. In light of this, my motivating problem is how to get less wrong representations of reality that remain hopeful.

This representation of reality venture can be broken into three stages:

1. Speculate a prior distribution for how unknowns (e.g. parameters) were determined or set in nature and then observations subsequently generated given those unknowns.

2. Deduce the most relevant representation given the actual observations that occurred (aka getting the posterior).

3. Evaluate the fit and credibility of the representation in light of 1 and 2 with prejudice for finding faults (ways to improve) returning to 1 until no further improvement seems currently plausible.

Steps of 1,2,3; 1,2,3; 1,2,3 – hold for now and hope.

Now the full theory I am trying to draw from is mostly statistical but some theory of (profitable) empirical inquiry (aka philosophy) is required as the aim is to enable others to avoid being misled when trying to learn from observations while being aware of the risks they are unable to avoid.

In summary my future posts will have these motivations, most likely will focus on speculation of _good_ priors and the evaluation of fitting, understanding, criticising and deciding to (tentatively) keep with representations. This should not be taken as suggesting that getting posteriors is less important – but that is not my strength (and I am hoping Stan will increasingly make that simple in more and more cases).

1. Jonathan says:

Not exactly on point, but I think the field can’t be better understood and applied without changing many of the basic terms. Example: you incorporate a double negative in the concept of disproving a null thesis, which itself is a bunch of words that really doesn’t mean what it purports to say because you aren’t “disproving” – misuse of the word proof – and “thesis” is a garbage word, sometimes replaced by other garbage words that don’t really say what is being nulled. And look at the word “null”: it is a higher level concept that confuses the fuck out of people because it means anything from not in existence anywhere to not in existence in this narrowly defined array/set (and then to, not in existence in this particular ordering – until you find an ordering in which null matters, of course!). Your conceptions rely on double negatives that have complex meanings so no bleeping wonder they’re hard to apply!

I’m also amazed by the persistence of old concepts like “Type 1 errors” that require mental translation from the already difficult “inappropriate rejection of a null”, as though that means something more than wrong answer. Even the concept of “false positive” doesn’t work because sometimes your null may be that nothing is supposed to happen but it did so the answer was wrong when the result was actual though the words necessarily imply that nothing happened. Same with so-called Type 2, of course: the logic behind these ideas is relentlessly tangled in reversals of sign of meaning as to be nearly opaque. Isn’t this obvious from the continual misunderstandings of basic statistics by scientists? And I note that many of these papers and ideas are in fact run by or through statisticians and yet still completely suck. Even accepting the terminology, there are many kinds of Type 1 errors so the idea should have been broken into bits decades ago and discarded as inadequate.

Sorry. I have a cold coming on and am cranky.

• Keith O'Rourke says:

> the field can’t be better understood and applied without changing many of the basic terms.
I agree, but here I think we need to aim at getting the pragmatic grade of understanding of the concept the term represents – what to make of an occurrence of it for how we think and act. My kick at parts of the can of worms you opened up is here at the very bottom http://www.healthnewsreview.org/toolkit/tips-for-understanding-studies/mixed-messages-about-statistical-significance/

> And I note that many of these papers and ideas are in fact run by or through statisticians and yet still completely suck.
That is something I have noticed too, some (many,most?) statisticians in practice just fall into going along with providing the complete certainty of analysis that the textbook pseudo-logic falsely assures.

Part of the challenge here is to get better ways of assessing repeated performance of both Bayesian and Frequentest (Canadian spelling?) methods such as type S and M errors.

Hope your cold is better ;-)

2. jimmy says:

welcome back, keith!

3. Mark Palko says:

Might be a bit general for what you have in mind, but George Pólya’s books on plausible reasoning are always worth a look. (If it was good enough for Lakatos…)

• Keith O'Rourke says:

Mark: From my recollection of reading Polya as an undergrad, it probably is worth a second look.

On the other hand, there is always the challenge of what (risky) reading to undertake ;-)

Today’s post by Andrew nicely locates the focus I had in mind “the greatest benefits of the Bayesian approach come not from default implementations, valuable as they can be in practice, but in the active process of model building, checking, and improvement.” http://andrewgelman.com/2016/12/13/bayesian-statistics-whats/

So how to better focus on, conceptualize and practice that active process of model building, checking, and improvement.