Someone writes in with the following question:

I’ve been studying Information Technology risk for some time now and so your work is of great interest. In IT risk we have several problems that a Bayesian approach would seem to help us address. Namely:

1.) We have only about 10 years of information

2.) The relevancy of that information changes somewhat quickly – sometimes weekly, sometimes monthly (thanks microsoft patch day) so it’s difficult to take any empiricist approach.

3.) We have very small sample sizes (the details of a threat action is rarely shared information).

What I’m discovering is that:

1.) Lack of common definition. Risk can be Threat can be Vulnerability can be Hazard, etc… Our standards bodies (the ISO) aren’t helping here, just making this problem worse by committee-think.

2.) Most IT security folks (notice I didn’t use “IT risk”) have an engineering background and therefore a frequentist perspective. As such, they reject the notion that probabilities can be attached to risk.

3.) They love the garbage in-garbage out argument. Similarly, it is commonly argued that “opinions” cannot be useful information.

I believe that the use of Bayes has the ability to significantly improve our profession. I believe that there are very smart people in our profession. What is troubling is the amount of evangelism it is taking to educate even the most intelligent IT Security folks. That said, I have a couple of questions for you if you have the time to consider them.

1.) Taleb rails against the use of Gaussian distributions. Most smart IT security folks have read Taleb, and therefore discount the notion of using them. But didn’t Jaynes have a position that Gaussian was actually an appropriate distribution to use when the actual distribution was uncertain?

2.) How do you deal with the frequentists and the tendency to casually dismiss inference because of “garbage-in, garbage-out”? I’ve pointed out that “fraudulent” use of data to push an agenda is not limited to any particular discipline – probability theory or not. However, the frequentists are still disturbed at the idea of using their experience and then accounting for their (residual?) uncertainty.

3.) We define risk as a value derived by the probable frequency of a loss event, and the probable impact of that event. Are we insane in our attempt to attach probabilities to risk?

My reply:

Garbage-in, garbage out is a real concern in statistical modeling and decision analysis. I discuss it a bit in this talk and in Chapter 22 of Bayesian Data Analysis. Classical decision theory does not always handle the GIGO problem well.

But I don’t see why to single out Bayes! Any statistical method has assumptions. Maximum likelihood, for example, can be much more unstable than Bayes–that’s why Bayesian inference is sometimes called “regularization.” See here, for example.

Regarding Taleb and the Gaussian distribution, I actually had a discussion with him on this. The t distribution can be interpreted as a scale mixture of Gaussians (that is, a Gaussian distribution where the scale itself varies). I’ve used the Gaussian distribution a lot (see all the examples in our books) but the t is probably a better general choice.

Finally, I think it makes a lot of sense to attach probabilities to risks. You just have to recognize the models used in creating these probabilities. You should check the fit of the model (by comparing replicated data to observed data) and alter it as necessary. Low probabilities can be estimated by a combination of empirical work and theoretical modeling (for example, here is our paper on estimating the probability of events that have never occurred).

Coincidentally, I was just reading this:

"Uncertainty is linked to the Bayesian idea of unknown outcomes and unknown underlying structures. Poker players face risk. The distribution of a deck of cards is known. The risk of the game comes with not knowing the exact outcome of the next draw. Investors, …, face risk and uncertainty. They do not know the exact outcome. More importantly, though, the underlying structure of the distribution is likewise unknown to some degree. Compared to the standard normal distribution often assumed by frequentists, a pure Bayesian analysis results in a “Student-t” distribution with significantly thicker tails."

More here, also with references to Taleb's Black Swans: http://www.gwagner.com/writing/2007/02/economics-…

and http://www.economicprincipals.com/issues/07.09.09…

Jaynes – in "Probability Theory: The Logic of Science" Chpater 7 – describes how Gaussian-based inference methods have enjoyed two centuries of success. But Jaynes goes further. He explains why the "Gaussian error law " is successful. This chapter also has a section titled: "The near irrelevance of sampling frequency distributions".

It's impractical to summarize the entire chapter here. But one quote from Jaynes is worth reading.

"We cannot understand what is happening until we learn to think of probability distributions in terms of their demonstrable information content instead of their imagined (and as we shall see, irrelevant) frequency connections."

(I'm not a good stats person, so beware my flagrant ignorance. Also, apologies if the formatting is off – it was in preview somehow)

//

On this topic, I've been declared a frequentist (trying to learn what that means). However, I don't know if the label is correct because my arguments have tied into your final point, which was:

"You just have to recognize the models used in creating these probabilities. You should check the fit of the model (by comparing replicated data to observed data) and alter it as necessary."

//

If you have an extremely limited empirical dataset, which is highly context-specific, how does one check the fit of the model? This is where I, personally, think the GIGO argument comes into play. If your sample size is very small and each context has a different scale, it's unclear how you can consistently and repeatably create good probabilities. The rest of the risk model seems to then collapse under the GIGO argument once the probabilities are undermined.

//

Does that makes sense? If so, how does one get around it? Your comment "The t distribution can be interpreted as a scale mixture of Gaussians (that is, a Gaussian distribution where the scale itself varies)." sounds like it might address the "scale varies within each context" problem, but what about then applying a limited dataset (population)?

Ben,

You can indeed check the fit of a model from a single dataset. See Chapter 6 of Bayesian Data Analysis for some examples. With Bayes as with all other methods, there will always be some assumptions that you can't check–but you can check a lot. And then, realistically, almost every method does end up getting applied to multiple datasets. That's where Bayesian and frequentist ideas converge: the frequentist cares about repeated applications of a method, and the Bayesian thinks of this in terms of hierarchical models. But it's fundamentally the same concept, I think.

Bayesians traditionally have not recognized that you can check model fit. (This was my frustration at the 1991 Bayesian meetings: people were not checking their models, and they were also insisting that that they shouldn't be checking their models.)