Via Tom LaGatta, Boris Glebov writes:
My labmates have statistics problem. We are all experimentalists, but need an input on a fine statistics point.
The problem is as follows. The data set consists of photon counts measured at a series of coordinates. The number of input photons is known, but the system transmission (T) is not known and needs to be estimated. The number of transmitted photons at each coordinate follows a binomial distribution, not a Gaussian one.
The spatial distribution of T values it then fit using a Levenberg-Marquart method modified to use weights for each data point.
At present, my labmates are not sure how to properly calculate and use the weights. The equations are designed for Gaussian distributions, not binomial ones, and this is a problem because in many cases the photon counts are near the edge (say, zero), where a Gaussian width is nonsensical.
Could you recommend a source they could use to guide their calculations?
My reply:
I don’t know anything about this (although I assume I could figure out a good answer easily enough if I knew more about the model). I just thought this was worth sharing, partly because maybe some readers have a good answer and partly as an example of the wide variety of terminology used in different statistical applications.
My general advice in this sort of problem is to forget about the weights as weights and instead think about where they came from, and include in the model (typically in a likelihood function or a poststratification summary) the information that went into the weights.
- Is the gaussian that’s being estimated the distribution of photons counts across all coordinates? Sounds like a roundabout way of imposing structure here
- By “the number of input photons” being known and the counts being binomial, does that mean that the number of input photons per coordinate is known and each coordinate is binomial with N being the input photons per coordinate?
- If I’m understanding, the system transmission could be described by the p parameters of the binomial over all coordinates, is that correct?
If this is a correct description, it seems like one way to estimate everything is to have a beta binomial, where there’s a separate binomial parameter for each position. The # bernoulli trials is different for each coordinate and is given as part of the data (assuming my description is correct). The shape of the distribution of binomial parameters is given by the beta distribution and estimated from the data with some hyperparameter distribution on the beta parameters. This doesn’t capture any spatial dependence between the coordinates, but seems to roughly follow what you describe, although maybe I’m not understanding the system/problem correctly. This would borrow strength across coordinates, although you’d probably want to check that the beta distribution is flexible enough for your data and incorporate additional spatial relationships if there are any.
Here’s a general approach to approximating a binomial distribution with a Gaussian one that avoids the issue of the bounded parameter space.
Step 1: stabilize the zero counts by adding pseudo-data to the actual data, e.g., one extra success in two extra trials. (You don’t need to use integer pseudo-data.) This is equivalent to using a beta prior distribution for the probability of transmission.
Step 2: express the (stabilized) log-likelihood in terms of the log-odds of transmission instead of the probability of transmission. (Under the log-odds transformation the interval [0, 1] is mapped to the whole real line.)
Step 3: approximate the log-likelihood using a second-order Taylor series expansion centered at the maximum. A second-order polynomial log-likelihood corresponds to a Gaussian distribution.
Step 4: pretend your approximation is exact and proceed with the estimation. You’ll be estimating log-odds of transmission, not probability of transmission, but you can reverse the log-odds transformation when you’re done.
Having worked on something mildly similar, my guess is the experiment goes something like this: The experimenter would like to understand the property or structure of some substance. The experiment is that they have some laser or x-ray that shoots photons towards the substance which lies in some (presumably spherical) photon detector. After hitting the substance, the photons will be deflected in all sorts of directions and detected/counted by this spherical detector. The data probably looks like the number of photons detected at each location in the sphere. The number and probability that a photon may or may not arrive at some point in the sphere (where the binomial distribution comes in) is given by how many photons were shot by the laser (known) and how the photons are deflected, which is given by the unknown weights of the transmission function (which we are trying to estimate) which can be used to infer the structure of the substance under investigation.
If the transmission function is known up to the weights, just throw it in the model:
Yij ~ Multinom(n, Wij * Transmission Function Kernel)
Wij ~ prior
Yij is multinomial with number of input photons, n, and the weights of the transmission function Wij detected at coordinate i,j
Wij is whatever prior, but hopefully somewhat informative (the experimenter could use like substances)
how many photons were shot by the laser (known)
This is almost certainly false. One almost certainly knows the mean laser output rather than the number of photons that were emitted in this pulse, in which case all those binomial distributions are actually Poisson.
This makes the problem easier.
I agree with Sam. Isn’t this a Poisson distribution?
As Sam notes, the distribution is Poisson, not binomial. My reference on LM code is the algorithm from Numerical Recipes, designed for fixed weights and a gaussian objective function. It’s easy to replace with a Poisson one, though — change the chisq function to add 2*(y_model – y_data + log(y_data / y_model)) for points with non-zero y_data, 2 * y_model when y_data is zero, and bound y_mod away from zero with an extra penalty term. The derivative and 2nd order terms have to be recomputed accordingly as well. With minor modification, the NR authors’ recipe works perfectly well. I’ve used it for fitting mortgage prepayment models, where the binomial would actually be correct, but the numbers are large enough so the Poisson is fine.
Sam and Oren: The Poisson distribution holds for some states of the electromagnetic field. It’s very possible for a laser to be in non-Poissonian states. Also, detection electronics often produces non-Poissonian binomial counting statistics even when the underlying photon distribution is Poissonian.
See, e.g., D. Stoler, B.E.A. Saleh, and M.C. Teisch, “Binomial States of the Quantized Radiation Field,” Opt. Act. 32, 345 (1985).
And even when the photons themselves are Poisson-distributed, see D.J. Kissick, R.D. Muir, and G.J. Simpson, “Statistical Treatment of Photon/Electron Counting: Extending the Linear Dynamic Range from the Dark Count Rate to Saturation,” Anal. Chem. 82, 10129 (2010), on how using discriminators in photon-counting produces binomial counting statistics even when the underlying distribution of photons is Poissonian.
For low-efficiency detectors, the binomial counting distribution becomes Poissonian, but for high-efficiency detectors it may not be.
See also, R.D. Muir, D.J. Kissick, and G.J. Simpson, “Statistical Connection between Binomial Photon Counting and Photon Averaging in High-Dynamic Range Beam-Scanning Microscopy,” Opt. Express 20, 10406 (2012).
Thank you all for your replies and suggestions!
Number of photons:
We were actually looking at two situations. The first is interrogating the sample using a coherent state, in which the number of photons follows a Poisson distribution with a known mean. In the second, we were using number (or Fock) states, in which the number of photons is exactly known every time.
Binomial distribution of results:
Basically, if we assume no scattering and a linear medium, each photon has probability T of being transmitted. So when running the model, I assume binomial distribution was used to produce the probabilities used in producing the outputs. (I did not write the actual simulation code, and only participated in general discussions.)
This parameter T (or p, in more generic nomenclature) is what needs to be estimated at each coordinate, and then given an uncertainty weight.
Detection:
Detectors were assumed to behave nicely. Their efficiency may have been varied, but at least they were linear and did not have other tricks. So for a detector with less than unity efficiency, reported photons would also follow a binomial distribution based on the same input.
Coordinate dependence:
I’m not actually sure what the coordinate dependence was. It was likely some higher-order Laguerre-Gaussian, but can’t say for certain. I think it’s a secondary question.
Again, thank you for your contributions. I will forward this thread to the guys who were actually working on this problem directly. Maybe they will provide some further information.