Andy McKenzie writes:
In their March 9 “counterpoint” in nature biotech to the prospect that we should try to integrate more sources of data in clinical practice (see “point” arguing for this), Isaac Kohane and David Margulies claim that,
“Finally, how much better is our new knowledge than older knowledge? When is the incremental benefit of a genomic variant(s) or gene expression profile relative to a family history or classic histopathology insufficient and when does it add rather than subtract variance?”
Perhaps I am mistaken (thus this email), but it seems that this claim runs contra to the definition of conditional probability. That is, if you have a hierarchical model, and the family history / classical histopathology already suggests a parameter estimate with some variance, how could the new genomic info possibly increase the variance of that parameter estimate? Surely the question is how much variance the new genomic info reduces and whether it therefore justifies the cost. Perhaps the authors mean the word “relative” as “replacing,” but then I don’t see why they use the word “incremental.” So color me confused, and I’d love it if you could help me clear this up.
We consider this in chapter 2 in Bayesian Data Analysis, I think in a couple of the homework problems. The short answer is that, in expectation, the posterior variance decreases as you get more information, but, depending on the model, in particular cases the variance can increase. For some models such as the normal and binomial, the posterior variance can only decrease. But consider the t model with low degrees of freedom (which can be interpreted as a mixture of normals with common mean and different variances). if you observe an extreme value, that’s evidence that the variance is high, and indeed your posterior variance can go up.
That said, the quote above might be addressing a different issue, that of overfitting. But I always say that overfitting is not a problem of too much information, it’s a problem of a model that’s not set up to handle a given level of information.