Skip to content
 

Theories of information and interestingness

Jean-Luc pointed me to Anomaly Hunt; or, How To Write a Research Paper. This brings me to the vague topic of what is interesting. They say that you haven’t understood a concept until you have been able to explain it to something as dumb as a computer. For that reason, a lot of my past research has dealt with how to take a philosophical concept, such as “interaction”, and convert it into a mathematical device. It turned out that the interestingness of an interaction is captured by measuring the additional information we gain allowing two variables to jointly affect the outcome.

The most interesting scientific articles are those that update our internal models of the world the most. An average case does not affect the model much, but an outlier or an extreme event will affect it more: because it updates the our intuitive distribution more than would an average case.

There are two ways of measuring this change. The traditional approach is to employ Fisher information – an outlier will affect the estimated parameters more than would an average case. I feel uneasy about tying the change in a parameter with information. Instead, I prefer to quantify interestingness with the KL-divergence between the original and the updated models. The more the new data affected the internal model, the more interesting it was.

It is not just data that can update our beliefs. Model structure has this property too. For example, a new variable (such as force) that helps us understand dynamics in simple terms allows us to build predictive models from the original vagueness. A realization that two variables are independent simplifies our models. A realization that two variables are dependent makes our models more complex, but superior in their predictive accuracy. But when the structure of the model changes, including the number of parameters, it is necessary to account for the trade-off of less cheap simplicity for more expensive accuracy.

But there are other sources of interestingness. For example, monkeys would pay more cherry juice to see a photo of a high-ranking monkey than a subordinate monkey (the most cherry juice would be be given up for photos of female monkey hind quarters). Our movies and our dreams also tend to be about important events, and not about average ones. The above theory of interestingness can account for such bias by replacing statistical divergence with changes in utility of actions made with the model. This can involve some sort of a model about what will the future actions be.

7 Comments

  1. Mark says:

    Interesting post (slight pun intended).

    Your paragraph on outliers updating our beliefs more than the average case reminded me of the paper Bayesian Surprise Attracts Human Attention from the NIPS conference of the year before last.

    In that paper, the authors model an agent's belief by a distribution over a set of models rather than a single, most likely model. Surprise at a new piece of data is then quantified as KL-divergence between the agent's prior distribution over all its models and its posterior distribution once the data is take into account.

    I thought you might be interested in that paper if you haven't seen it already as it seems to echo some of your thoughts.

  2. Aleks says:

    Mark, thanks for this pointer! Indeed, this paper has the same definition as I am proposing, but elaborated properly. However, my last two paragraphs might go a bit further.

    I also feel a bit uneasy talking about prior and posterior when we have a single data item under consideration; imagine 100 cases, to asses whether case 10 is an outlier, I would update the prior with 99 cases and use that as the original model, update this with case 10 to obtain the new model, and compare these two.

  3. Thom says:

    This seems very similar to Abelson (1995): he defined the interestingness of evidence as the degree of belief change it promoted (in his book "Staistics as Principled Argument.")

    Thom

  4. Aleks says:

    Thom, thanks for this pointer. The earliest reference of this type that I'm familiar with is Gregory Bateson's definition of information as "a difference that makes a difference". Interestingness is very closely aligned with informativity.

  5. Paul Litvak says:

    The best paper I've read on the subject of interestingness of research is a paper by Murray Davis called "That's Interesting! Towards a Phenomenology of Sociology and a Sociology of Phenomenology", which I read for a behavioral econ class taught by George Loewenstein (I have to give him props as its one of his favorite papers).

    You can find it online here:
    http://www.mang.canterbury.ac.nz/writing_guide/ma…

    The taxonomy goes far beyond just degree of belief change…

  6. Aleks says:

    Paul, this article does affirm the above definition that's posed in terms of belief change. In particular:

    "Interesting theories deny certain assumptions of their audience, while non-interesting theories affirm certain assumptions of their audience."

    Clearly, a theory leading to a change in one's assumptions is the same as changing the model, or providing data that leads to a change in the model. We're all in agreement, I think.

  7. Mark says:

    If I understand your second to last paragraph correctly, you are saying that when some new term enters our scientific vocabularly it can change the complexity and predictive power of our models and therefore make us reasses their value to us. This seems reasonable.

    Interestingly (that word again), the definition of a "wow" given in the paper I mentioned is symmetric with respect to models and data (Bayes' identity implies P(M|D)/P(M) = P(D|M)/P(D)).

    This means you can ask "how wow-ed am I by a particular piece of data when I consider a particular model". This may give some way of quantifying the effect in a change of scientific vocabulary.

    With regards to your unease, I don't quite follow your example. Is it meant to show why you are uneasy or how you would make yourself more comfortable?

    In principle, I don't think there is any reason why you can't update with respect to a single instance. Bayes' identity doesn't care how little data you have as long as you can compute the probabilities.