Priors are important in Bayesian inference.
Some would even say : ” In Bayesian inference you can—OK, you must—assign a prior distribution representing the set of values the coefficient [i.e any unknown parameter] can be.”
Although priors are put first in most expositions, my sense is that in most applications they are seldom considered first, are checked the least and actually fully comprehended last (or perhaps not fully at all).
It reminds of the comical response of someone when asked for difficult directions – “If I wanted to go there, I wouldn’t start out from here.”
Perhaps this is less comical – “If I am going to be doing a Bayesian analyses, I do not want to be responsible for getting and checking the prior. Maybe the domain expert should do that or just accept the default priors I find in the examples sections of the software manual”.
In this post, I thought I would recall experiences in building judgement based predictive indexes where the prior (or something like it) is perhaps more naturally comprehended first, checked the most and settled on last. Here there are no distraction from the data model or posterior as there usually isn’t any data nor is any any data anticipated soon – so its just the prior.
Maybe not at the time, but certainly now I would view this as a very sensible way to generate a credible Bayesian informative prior that involved intensive testing of the prior before it was finally accepted. Below, I am recounting one particular example of this I was involved in about 25 years ago as a prelim to investigating in later posts what might be a profitable (to a scientific community) means to specify priors today.
We did write up the methodology at the time but I think I can give enough description of it by recalling one of the more interesting applications: developing a predictive index for a children’s aid society in order that they could predict whether a judge would find for child neglect (child to be put into foster care). Going to court to get an intervention was very expensive and the children’s aid society wanted to have as good a sense as they could of likely being successful before proceeding.
The process involved identifying and recruiting a group of experts that adequately spanned the knowledge around child protection and the court’s involvement in child protection. It comprised of social workers from children’s aid societies as well as lawyers and judges with experience in child protection cases. Prior to the first meeting they were interviewed by a consensus group facilitator to ensure they understood the task, were qualified and willing. In addition, the group facilitator tried as well as they could to get each individual expert’s sense of the dimensions (variables) such an index should have and how each dimension should be graded and an overall score discerned from these. The consensus facilitator then tried their best to form a naive consensus index using all the group members individual input as a way to start the first day of a two day meeting. On the first day, the experts would jointly discuss their views on the facilitator’s admittedly amateur attempt to form a consensus index and they would try to make improvements on it. (The critical work of the facilitator was to ensure the work was jointly and severally done by all experts without the usual separate cliques of members spontaneously forming to do battle for their superior view of what would be best.)
It was really important to get a credible, even if very tentative, consensus index by the end of the first day as it was used to generate fake cases over night for the group members to review and argue over giving their individual subjective judgements of how likely such a case would actually lead to a judge making a finding for child neglect. The subjective judgement scores would then be compared to the score based on the tentative consensus index. Then all was revised, variables, scoring of the variables and their subjective scores. This review and scoring of fake cases also helped the group members reflect on how common it was for cases like these occur and whether there were common cases that did not come up in fake cases generated. As the meeting progressed, the cases would be continually be recalled for further discussion, revision of subjective scores, variables and scoring of the variables as well as possible modifications to or creation of new fake cases to ensure coverage of all cases likely to arise in practice. A unique incentive was used to get completion of the tentative consensus index by the end of the first day: dinner was provided on site and not served until a tentative consensus index was agreed upon. There was not the same draconian incentive on the second day and consensus at that point was later to be confirmed by email shorty after the meeting (within a week or so) to allow for second sober thought.
The resulting index (as best I can recall) involved five dimensions of variables – providing shelter, food, clothing, education and home supervision with levels one to five, one being full provision and five being essentially no provision. The index scoring rule was simply, a judge would likely find for neglect if there was a five on any one dimension or three or more fours on any combination of dimensions and would not otherwise. An apparently sensible and credible predictive index in the eyes of the group of experts that developed it along with some additional colleagues they could coerce into looking at it. (Unfortunately I have no idea what happened with it.)
Again, I am not sure if I thought of this a careful specification and testing of a prior at the time but I would now. The testing out priors phases was extremely important. Some people who were involved in this type of judgemental predictive index work would regress the finalised subject scores given to the fake cases against the finished index variables to improve the index weights or coefficients. I was completely opposed to this at the time as it seemed like double use of data – the experts used these scores to mentally revise the weights and then an analyst was re-using them to revise the weights. Not sure what i would do today – maybe just partially revise them using some fractional power of the likelihood?
I found the work very enjoyable and was disappointed it did not continue to be done within my group. It was though a very expensive way to generate a prior (say $20,000 plus)!
“I wouldn’t start from here” (joke) “‘Tis the divil’s own country, sorr, to find your way in. But a gintleman with a face like your honour’s can’t miss the road; though, if it was meself that was going to Letterfrack, faith, I wouldn’t start from here.”
So this is my current fairly vague prior on what might be a profitable (to a scientific community) means to specify priors today.