It all began with this message from Christopher Bonnett:
I’m a observational cosmologist and I am writing you as I think the following paper + article might be of interest for your blog.
A fellow cosmologist, Fergus Simpson, has done a Bayesian analysis on the size of aliens, it has passed peer-review and has been published in Monthly Notices, a reputable astronomy journal. (just to make clear that it’s not some crackpot theory). Fergus has also written a blog-post with a less technical explanation of his research.
As the article is very heavy on bayesian analysis, I thought it might be an interesting topic for you and your co-bloggers to read and potentially blog about.
Indeed, this reminded me of the example on page 90 of this paper.
I see two problems with Simpson’s analysis, problems that are serious enough that it all just seems wrong to me. My first problem is the prior or distribution of aliens, and my second problem is the likelihood or model for what is observed. I don’t actually see any prior at all in the paper, I guess there must be some implicit prior, but otherwise it would seems for anyone to say much from a sample of size 1 with no prior. [No, I was wrong, there is a prior; see P.P.S. below. — AG] As for the likelihood, the assumption seems to be that we, or Fergus Simpson, in writing this paper, is being randomly sampled from all creatures in the universe, or I guess all creatures that have the ability to think these thoughts, or maybe all creatures that have the ability to get a paper published in an astronomy journal, or . . . whatever. It doesn’t make sense to me.
I forwarded the discussion to David Hogg, who wrote:
Amusing idea, but there is only so far you can go in these “we have one data point” studies. That said, we want to pick targets for spectroscopy and imaging carefully, since each imaged planet is likely to cost upwards of $50 million to image (see current plans for terrestrial planet imaging).
Also, why didn’t he also use the data point that Mars (which is also habitable-zone) doesn’t have intelligent life on it? That might be more informative than the datum that Earth does have life.
That’s pretty much it for me, but Fergus Simpson did reply, so I’ll also include some of our back-and-forth discussion:
Andrew – you mentioned a concern about the “prior or distribution of aliens”, which I presume refers to the distribution of population sizes? It may sound surprising, but the secondary inferences (e.g. the sampling bias on planet size and body size) are insensitive to this function. For example, consider dividing all inhabited planets into large (above median) and small (sub-median) size. Now if we suppose that the mean population of the large planets are four times greater than the mean population of the small planets, then the odds of living on a large planet are 4:1. No matter how one chooses to distribute the aliens amongst the individual planets, the odds remain 4:1.
David – regarding your comment on Mars – it is certainly an interesting case. To make use of this datapoint one could decompose the size distribution of inhabited planets p(r) in terms of a product of the total number of planets n(r) and the habitation fraction i(r) such that
p(r) \propto n(r)*i(r)
However there are a couple of problems – the first is we don’t know n(r) very well below for r<1 Earth radii, and the second is that i(r) is probably very much less than unity. So to find a single example (Mars) devoid of intelligent life isn't very informative. So instead of using that formulation, the approach I adopted was to jump straight to the (very unknown) distribution of inhabited planets p(r), and marginalise over different means and standard deviations of that function with some reasonable sets of priors. Regarding the idea that "there is only so far you can go in these 'we have one data point' studies": This is something I've heard mentioned frequently, but have yet to see compelling evidence! If one performs a single roll of a fair die, then nothing is learned of what is written on the other faces. However if one rolls a loaded die - where the probability of rolling each face is weighted by the number shown on the face - it's a different story. If you roll a 10 with this loaded die, you can suddenly be confident that all the other faces are not much greater than 10. That's essentially what's happening here.
The prior distribution or base rate has to come into the calculation; it’s just basic probability theory. The other issue is that we are not a random sample of creatures in the universe, or even a random sample of sentient creatures. So I don’t think your analogy with countries of the world works out. In your analogy, one is picking a random person among all humans on earth. But in the planet example, we’re just us, we’re not randomly-selected critters.
In equation (5) I employ a reference prior as denoted by pi(mu, sigma^2). Also Figure 1 illustrates the different outcomes one reaches when adopting different prior distributions. If there’s a particular equation that’s troubling you, please let me know.
Regarding the sampling model, consider the following thought experiment – imagine humans had colonised other planets, as well as other continents. If one was uninformed which planet you lived on, it would still be reasonable to weight one’s probability in the same manner as countries – i.e. more likely to live on a higher population planet. Now, if those colonies on other planets had not travelled from earth, but evolved independently, would this reasoning change, and why?
Your conclusion about planet sizes is (as I see it) that you expect them to be near Earth size, and your conclusion about animal sizes (as I see it) is that they be near human sizes. Of course you get a distribution, but it is peaked near the Earth and its shape is set by your prior, really, not your likelihood. So the data are just saying “I expect things to be like the single data point I have seen, with some skew towards regions of higher prior probability”. That is interesting, but limited.
Another way to say it: The likelihood principle says that all the *new* information is coming from the likelihood. The likelihood, with one data point, is not very informative. The paper would have the same content if all you had plotted is the likelihood function, and you would have found that it is very broad, I expect.
Me (to Simpson):
I still don’t buy the random sampling assumption. And once you start thinking of random sampling, why sample creatures at random? Why not sample molecules, or cells, or, to go at it from the other way, families, or societies? To put it another way, in your countries example the key assumption is that a human is being selected at random.
So my perspective would be this – I’m experiencing a stream of consciousness originating from a neural network which is housed within a shell (my body). And I observe there to be many others such as yourself who are experiencing the same phenomenon. I have no idea how I ended up in this particular shell. It doesn’t appear to be special (people who are over 7 feet tall might decide otherwise!). As I see it the shell itself is irrelevant – what matters is the neural network which triggers consciousness. And our neural networks are all extremely similar. So my belief is that each shell which houses a conscious mind was equally probable. But if you happen to believe that you were more likely to inhabit some shells than others, that is your prerogative.
As I see it, Simpson is postulating andom sampling, and that’s just a model he has, I don’t see how it applies at all. He’s got an urn model but there ain’t no urn. Of course, as a practitioner of likelihood-based statistics, I work with such models all the time (you didn’t think those Poisson distributions and normal distributions and logistic regression models were real, did you?), but the point is that the model needs some justification, and in this case I don’t see any at all! What I see is an argument by analogy that doesn’t really make sense. I suppose that to really shoot it down I’d need to create my own model, some hypothetical distribution of planet sizes and populations of sentient creatures, and show how Simpson’s method could give wrong answers. Perhaps worth doing sometime, but in any case I thought I’d post our discussion here. I appreciate Simpson’s cordial engagement with skeptics.
P.S. See section 4 of this paper for a Bayesian discussion of a similar theory (the “doomsday argument”).
P.P.S. Regarding the prior distribution, Simpson writes:
– In equation (5) I employ a reference prior as denoted by pi(mu, sigma^2)
– Figure 2 (the main result of the paper) plots lines which are
labelled ‘narrow prior’ and broad prior’
– Most of Section 3 is spent discussing the choice of prior, and
studying the results of modifying the prior.