Multilevel model

Gregor Gorjanc writes,

My colleague has biological data, which measures degree of DNA damage in
cells (Olive Tail Moment – OTM). This data are gathered via so called
comet assay test and used to detect genotoxicitiy of various chemicals,
environmental waters, soil, … The test (this is very imprecise
description, but should show the point) is conducted in such a way that
say we take blood sample from 10 animals, where first 5 animals are
under treatment of interest and the other five are used for control.
Specified type of cells is extracted from blood sample, “processed” and
finally a set of those cells (usually around 100) is scored for OTM.

Now my questions are related to analysis of such data:
– Consider that we take two samples from each animal, we can include
effect of replicate into the model. I belive that we can safely assume
that replicates come from a normal distribution, so specifying a
hierachy for this effect makes sense. However, having only two levels,
posterior will be improper. What can I do?

– There are not many papers one could “learn” about stat. analysis of
OTM data. One author wrote some guidelines and one of them is that data
should be modeled with animal as statistical unit i.e. compute some
statistic of scored cells per animal and analyse this statistic. Besides
leaving out the variability/precision of this statistic (although this
could be incporporated into analysis as in 8 school example) I can not
get the general picture about statistical unit. Is animal really the
unit? You can imagine that using multiple of 100 records can be
different than using multiple of chosen statistic of 100 records.
I have also another set of data, where animals are replaced by soil
samples and these samples are applied to medium of some cell culture,
say some bacteria. As before, a set of cells is scored for OTM.
Is now a soil sample a statistical unit or should we take cell as unit?

My reply:

1. If you have 2 samples from each animal, you have more than enough data to estimate a hierarchical model. The posterior distribution for the measurement variance will be proper. It would only be improper if you estimated, completely independently, a separate variance for each animal. But there’s no reason to do that. If you wanted to let the variance vary by animal (which you probably wouldn’t do, since you don’t have enough data to easily distinguish these variances), you’d want to model the variances themselves hierarchically.

2. A multilevel model (treatments at the animal level, two samples from each animal, cells nested within samples) is best. However, in practice it’s probably ok to just average the 200 measurements for each animal and then do a classical analysis, since the treatment is at the animal level.

An example with similar structure is in Section 2.2.2 of this paper.