Using sample size in the prior distribution

Mike McLaughlin writes:

Consider the Seeds example in vol. 1 of the BUGS examples. There, a binomial likelihood has a p parameter constructed, via logit, from two covariates. What I am wondering is: Would it be legitimate, in a binomial + logit problem like this, to allow binomial p[i] to be a function of the corresponding n[i] or would that amount to using the data in the prior? In other words, in the context of the Seeds example, is r[] the only data or is n[] data as well and therefore not permissible in a prior formulation?

I [McLaughlin] currently have a model with a common beta prior for all p[i] but would like to mitigate this commonality (a kind of James-Stein effect) when there are lots of observations for some i. But this seems to feed the data back into the prior. Does it really?

It also occurs to me [McLaughlin] that, perhaps, a binomial likelihood is not the one to use here (not flexible enough).

My reply:

Strictly speaking, “n” is data, and so what you want is a likelihood function p(y,n|theta), where theta represents all the parameters in the model. In a binomial-type example, it would make sense to factor the likelihood as p(y|n,theta)*p(n|theta). Or, to make this even clearer: p(y|n,theta_1)*p(n|theta_2), where theta_1 are the parameters of the binomial distribution (or whatever generalization you’re using) and theta_2 are the parameters involving n. The vectors theta_1 and theta_2 can overlap. In any case, the next step is the prior distribution, p(theta_1,theta_2). Prior dependence between theta_1 and theta_2 induces a model of the form that you’re talking about.

In practice, I think it can be reasonable to simplify a bit and write p(y|n,theta) and then use a prior of the form p(theta|n). We discuss this sort of thing in the first or second section of the regression chapter in BDA. Whether you treat n as data to be modeled or data to be conditioned on, either way you can put dependence with theta into the model.