Bayesian modeling for kidney filtering

Chris Schmid (statistics, New England Medical Center) writes:

We’re trying to make a prediction equation for GFR which is the rate at which the kidney filters stuff out. It depends on a bunch of factors like age, sex, race and lab values like the serum creatinine level. We have a bunch of databases in which these things are measured and know that the equation depends on factors such as presence of diabetes, renal transplantation and the like. Physiologically, the level of creatinine depends on the GFR but we can measure creatinine more easily than GFR so want the inverse prediction. Two complicating factors are measurement error in creatinine and GFR as well as the possibility that the doctor may have some insight into the patient’s condition that may not be available in the database. We have been proceding along the lines of linear regression, but I suggested that a Bayesian approach might be able to handle the measurement error and the prior information. I’m attaching some notes I wrote up on the problem.

So, we have a development dataset to determine a model, a validation set to test it on and then new patients on whom the GFR would need to be predicted as well as some missing data on potential important variables. What I am not clear about is how to use a prior for the prediction model, if this uses information not available in the dataset. So we’d develop a Bayesian scheme for estimating the posteriors of the regression coefficients and true unknown lab values but would then need to apply it to single individuals with measure of creatinine and some covariates. The prior on the regression parameters would come from the posterior of the data analysis, but wouldn’t the doctor’s intuitive sense of the GFR level need to be incorporated also and since it’s not in the development dataset, how would that be done? It seems to me that you’d need a different model for the prediction than for the data analysis. Or is it that you want to use the data analysis to develop good priors to use in a new model?

A Bayesian approach would definitely be the natural way to handle the measurement error. I would think that substantive prior information (such as doctor’s predictions) could be handled in some way as regression predictors, rather than directly as prior distributions. Then the data would be able to assess, and automatically calibrate, the relevance of these predictors for the observed data (the “training set” for the predictive model).

Any other thoughts?