A question about negative binomial regression

Posted on December 11, 2006 12:26 AM by Andrew

Sarah Croco writes:

A coauthor and I recently encountered a bit of uncertainty regarding an underlying assumption of the negative binomial regression (NBREG) and were wondering if anyone had any advice on how to proceed. Our question centers on whether the NBREG model is capable of handling interdependence between counts, and, if so, what kind of interdependence is it designed to capture?

In several texts authors suggest using an NBREG model instead of a Poisson model when overdispersion is present. In examples overdispersion is often attributed to one of two causal mechanisms. The first, which we call an “omitted variable effect”, occurs when there is some unobserved variable present in the data that makes some units/subjects have higher counts than others. A common example is the number of published papers an assistant professor produces in a year. We cannot assume the rate of publication is constant because professors will vary in their productivity for a number of reasons that are specific to each individual. A similar example has to do with how well sports teams perform across a season. Some teams will score at a higher rate than others because of a variable we cannot observe. In these examples, there is an interdependence within individual professors and within individual teams.

The second causal mechanism could be called “success breeds success”. In this case, the individual counts are not independent of one another because success in one period might encourage the subject to make another attempt. For example, a successful sales pitch on Wednesday for a door-to-door salesman may encourage him to try again on Thursday. Another example might be the number of violent episodes mentally ill patients undergo in a given year. One hypothesis might be that a violent episode in time t leads to an increased probability of a violent episode in time t+1 (a cathartic effect is also possible, where a violent episode in time t reduces the probability that the patient will undergo a violent episode in time t+1). Under this causal mechanism the contagion effect or interdependence is across time.

After searching the literature, we are left with two questions.

1. Are NBREG models meant to handle interdependence? (While there seems to be a consensus of “yes” on this answer, several publications suggest the exact opposite. One paper, in fact, went to great lengths to demonstrate why and how current NBREG models need to be modified to be capable of handling non-independence).

2. If NBREG models can handle non-independence, which kind of non-independence are they meant to handle? Independence within subjects, where there is some omitted variable that would account for why some subjects have higher counts than others or independence across time where a success in time t leads to a second attempt in time t+1?

5 thoughts on “A question about negative binomial regression”

Jack Weiss on December 11, 2006 11:09 AM at 11:09 am said:

There are at least 12 distinct probabilistic processes that can give rise to a negative binomial distribution (Boswell and Patil, 1970). In my field, statistical ecology, three of these are often applicable–(1) heterogeneity in the Poisson intensity parameter (the negative binomial arises as a gamma mixing distribution for a heterogeneous Poisson distribution), (2) grid sampling from a clustered population (the negative binomial arises as a generalized Poisson model with Poisson distributed clusters and log series counts in a cluster), and (3) the outcome probability changes depending on the process history (the negative binomial arises as a limiting distribution of a Polya-Eggenberger urn model). The causal mechanisms you mention could be interpreted as examples of the first and third of these processes. Separate from these theoretical considerations the use of a negative binomial model can also be motivated by the nature of the mean-variance relationship of the response. I discuss some of these issues in a course I teach. See, e.g., the lecture notes for lectures 4-8 at http://www.unc.edu/courses/2006spring/ecol/145/00…
Barry on December 12, 2006 5:16 AM at 5:16 am said:

I'm flying blind here (I don't have a copy with me), but the book 'Univariate discrete distributions' (Johnson, Kotz and Kemp) should be worth browsing for further information on the distribution. It's one of those books which, in an ideal world, would be on every statistician's shelf.
Peter on December 13, 2006 12:21 PM at 12:21 pm said:

I think it depends what you mean by 'handle'.

In the sense that the interdependence can be dealt with as a hidden variable, yes, it deals with it. But we can do much better. The 'hidden variable' could be anything leading to overdispersion.

If you look at nonlinear mixed models, then you can include the time-variable as a random effect. So, with professor's publications, you could have papers per year and have each professor have his/her own intercept and slope – then, within each year, you minght STILL want NB, but you might get away with Poisson.
Michael Brown on August 8, 2007 6:33 PM at 6:33 pm said:

The book by Joseph Hilbe titled, Negative Binomial Regression (Cambridge University Press) should answer some of the questions raised in this discussion.
Joseph Hilbe on September 7, 2007 6:54 AM at 6:54 am said:

I address this issue in my recently released book, Hilbe, Joseph M., Negative Binomial Regression (Cambridge University Press, 2007). Basically, the negative binomial can be used to model unidentified correlation in the data, regardless of the cause. When we can identify the reason for the extra correlation, then one can use a model appropriate for the data – which may be a negative binomial, or not. Of course, there are a variety of negative binomial models, each which address certain types of data situations. Note also that like the Poisson, the negative binomial can be overdispersed as well. Typically in such situations one can use a random intercept, or coefficient, or a host of other adjustments. I noticed that one of the statisticians commenting on this query asserts that the negative binomial is a type of Poisson-gamma mixture. The NB-2 (traditoinal version) and NB-1 (constant dispersion) can be derived in that manner, but the negative binomial need not be considered in that manner at all. But this is all discussed in the book. Joseph Hilbe

Comments are closed.