Following up on our discussion from last week on inference for fisheries, Anders Lamberg writes:

Since I first sent you the question, there has been a debate here too.

In the discussion you send, there is a debate both about the actual sampling (the mathematics) and about more the practical/biological issues. How accurate can farmed be separate from wild fish, is the 5 % farmed fish limit correct etc… There is constantly acquired new data on this first type of question. I am not worried about that, because there is an actual process going on that makes methods better.

However, it is the discussion of the second question, use of statistics and models, that until recently, have not been discussed properly. Here a lot of biologists have used the concept “confidence interval” without really understanding what it means. I gave you an example of sampling 60 salmon in a population of 3000. There are a lot of examples where the sample size have been as low as 10 individuals. The problem has been how to interpret the uncertainty. Here is a constructed (but not far from realistic example) example:

Population size is 3000. From different samples you could hypothetically get these three results:

1) You sample 10, get 1 farmed fish. This gives 10 % farmed fish

2) You sample 30, get 3 farmed fish. This gives 10 % farmed fish

3) You sample 60, get 6 farmed fish. This gives 10 % farmed fishAll surveys show the same result, but they are dramatically different when you have to draw a conclusion.

When reporting the sampling (current practice) it is the point estimate 10 % that is the main reported result. Sometimes the confidence interval with upper and lower limits is also reported, but not discussed. Since there is only one sample drawn from the populations, not discussing the uncertainty with such small samples can lead to wrong conclusions. In most projects a typical biologist is reporting, the results are a part of a hypothetical deductive research process. The new thing with the farmed salmon surveys, is that the results are measured against a defined limit : 5 %. If the point estimate is above 5 %, it means millions in costs (actually billions) for the industry. On the other hand, if the observed point estimate is below 5 % the uncertainty could affect he wild salmon populations . This could result in a long term disaster for the wild salmon.

With the risk of being viciously tabloid: The biologists (and I am one of them) have suddenly come into a situation where their reports have direct consequences. The question about the farmed salmon frequencies in the wild populations have become a political question in Norway – at the highest level. Suddenly we have to really discuss uncertainty in our data. I do not say that all biologists have been ignorant, but I suspect that a lot of publications have not and do not address uncertainty with respect.

The last months more mathematical expertise here in Norway have been involved in the “farmed salmon question” presented. The conclusion so far is that you cannot use the point estimate. You have to view the confidence interval as a test of a hypothesis:

H: The level of farmed salmon is over 5 %

If the 95 % confidence interval has an upper limit that contains the value 5 % or higher, you have to start measures. If the point estimate for example is 1 % but the upper limit in the 95 % confidence interval is 6 %, we must start the job to remove farmed salmon from that population. The problem with this and the fact that the confidence interval from almost all the surveys will contain the critical value of 5 % (although the point estimate is much lower), is that in most populations you cannot reject the hypothesis. The reason for all intervals containing the critical value, is the small sample sizes.

To use this kind of sampling procedure your sample size should exceed about 200 salmon to give a result that will the fish farming industry fair treatment. On the other hand, small sample sizes and large confidence intervals will always be a benefit for the wild salmon. I would like that on behalf of nature, but we biologists will then not be a relevant as experts that give advice in the society as a whole.

Then there are a lot of practical implications linked to the minimum sample size of 200. Since the sample is done by rod catch, some salmon will die due to the sampling procedure. But the most serious problem with the sampling is that several new reports now show that the farmed fish will more frequently take the bait. It is shown that the catchability of farmed salmon is from 3 to 10 times higher than that of wild salmon. This will vary so you cannot put in a constant factor in the calculations.

The solution so far seems to use other methods to acquire the samples. Snorkeling in the rivers performed by trained persons, show that over 85 % of the farmed fish is correctly classified. Since a snorkeling survey involves from 80 to 100 % of the population, the only significant error is the wrong classification, which is a small error compared to the uncertainty of small sample procedures.

Thanks again for showing interest in this question. The research institutions in Norway have not been that positive to even discuss the theme. I suspect that has to do with money. Fish farmers have focus on growth and money but sadly, but so far I guess the researchers involved to monitor environmental impacts see that a crises give more money for research. Therefore it is important to have the discussion free of all questions about money. Here in Norway I miss that kind of approach you have to the topic. The discussions and development and testing of new hypothesis is the reason why we became biologists? It is the closest you come to be a criminal investigator. We did not want to become politicians.

My general comment is to remove the whole “hypothesis” thing. It’s an estimation problem. You’re trying to estimate the level of farmed fish, which varies over time and across locations. And you have some decisions to make. I see zero benefit, and much harm, to framing this as a hypothesis testing problem.

Wald and those other guys from the 1940s were brilliant, doing statistics and operations research in real time during a real-life war. But the framework they were using was improvised, it was rickety, and in the many decades since, people keep trying to adapt it in inappropriate settings. Time to attack inference and decision problems directly, instead of tying yourself into knots with hypotheses and confidence intervals and upper limits and all the rest.