“An exact fishy test”

Macartan Humphreys supplied this amusing demo. Just click on the link and try it—it’s fun!

Here’s an example: I came up with 10 random numbers:

> round(.5+runif(10)*100)
 [1] 56 23 70 83 29 74 23 91 25 89

and entered them into Macartan’s app, which promptly responded:

Unbelievable!

You chose the numbers 56 23 70 83 29 74 23 91 25 89

But these are clearly not random numbers. We can tell because random numbers do not contain patterns but the numbers you entered show a fairly obvious pattern.

Take another look at the sequence you put in. You will see that the number of prime numbers in this sequence is: 5. But the `expected number’ from a random process is just 2.5. How odd is this pattern? Quite odd in fact. The probability that a truly random process would turn up numbers like this is just p=0.074 (i.e. less than 8%).

Try again (with really random numbers this time)!

ps: you might think that if the p value calculated above is high (for example if it is greater than 15%) that this means that the numbers you chose are not all that odd; but in fact it means that the numbers are really particularly odd since the fishy test produces p values above 15% for less than 2% of all really random numbers. For more on how to fish see here.

36 thoughts on ““An exact fishy test”

  1. Andrew:

    It seems to me they are performing a classical sampling-based test based on expectations, not an exact test based on permutations of given data.

    If so, your title is a little misleading.

  2. It really is amusing. My first try (well, second, the first one included number greater than 100 but applet dutifully included it in the summary. Feature or bug?) gave p=11% (neighboring digits), which, I guess, is right there. High enough to be considered “random”, but small enough not to be dismissed as “p-value is too high”.

  3. My “most random” contrived sequence: 4, 15, 27, 35, 47, 53, 69, 78, 83, and 98, with p=0.32. I was half-surprised not to see a complaint like, “the likelihood of a random sequence being monotonic is …”

  4. I think this is just a ‘fishy’ test as it says. Like p-hacking. Any 10 digits you enter will have some pattern in them if you look for enough different patterns, because at the end of the day there are only 10 digits… not sure even the test is real…

  5. The fallacy is in the phrase, “random numbers do not contain patterns”. That’s baloney — which might be easier to explain with flipping coins: You can get any of the patterns HHHTTT, HTHTHT, TTTHHH, THTHTH, HHTTHH, TTHHTT, HHHHHH, TTTTTT by flipping a fair coin six times. Numbers just have more possible patterns.

    Another analogy: There are so many possible coincidences, that it’s almost certain that some “coincidence” will occur — e.g., it’s almost certain that a randomly chosen person was born on the birthday of some famous person. (If anyone claims they weren’t born on the birthday of a famous person, they just haven’t looked hard enough.)

    • Well, by definition random numbers do not contain patterns, so long as they are truly random and n->infinity. Your coin example is true, but unrelated. A true comparison would be comparing the distribution of randomly flipped coins to a series of coins flipped infinite times. Since in this scenario we are comparing the distribution of a set of 10 numbers to the distribution of the limit of a set of random numbers.

      • “by definition random numbers do not contain patterns”??? Since when? “A set of n random numbers” means (by definition) a set of n numbers chosen by a random process. This can indeed produce a set of numbers with a pattern.

        • It’s actually rather hard to define “a random process” the most reasonable definitions involve either compressibility (kolmogorov complexity) or an “ultimate, computable test” (Per Martin-Löf). With a small sample size (10 numbers) it´s basically impossible to reject the hypothesis that the numbers came from a uniform random number generator on the integers {1,2,3,…100}.

          Though it’s easy to find *a* test which will reject this hypothesis if you use enough tests.

          Note, the “ultimate computable test” can be proven to exist, but it’s non-constructive. Figuring out which test to use is most likely non-computable. Hence, we use some kind of battery of tests a-la “die hard” and associated follow-up tests. They are only useful for pseudo-random number generators that can generate millions or billions of numbers for testing.

    • Chris:

      You’ll do fine as long as you tread carefully when working with data that are weak compared to available prior information (as in the various “Psychological Science”-style studies we’ve been discussing a lot lately). You can do reasonable non-Bayesian analysis of such data, but you have to be careful, as you’ll have to regularize or control your inferences one way or another to avoid falling into the abyss.

      • > …you’ll have to regularize or control your inferences one way or another to avoid falling into the abyss.

        Yes, in my previous job I spent a lot time figuring how to frame problems so that I can get a decent answer using ridge regression. The motivation was purely pragmatic. I was usually working with tight processing timelines – milliseconds, give or take. Ridge regression provided decent answers and met the timing requirements. (In that niche Patton’s maxim carried the day, “A good plan executed violently now is better than a perfect plan executed next week.”) In general, when deciding between H0 and H1 I’d rather have the Bayes factor than the likelihood ratio but if calculating the integrals to get the Bayes factor blows my timeline then a GLRT will suffice. (Actually, making the HO/H1 decision based on the GLRT alone usually wasn’t good enough – also needed to check the quality of the fit to the data under H1.)

        > You’ll do fine as long as you tread carefully when working with data that are weak compared to available prior information (as in the various “Psychological Science”-style studies we’ve been discussing a lot lately).

        One of things I enjoy about your blog is reading about people working problems which are extraordinarily different from the ones I work using tools which I have used or at least thought about using. The science is interesting and I like to believe that seeing people work problems in other areas gives me a better appreciation of the methods that I’m trying to apply.

  6. All I have to say is that if my first five tries … by using “round(100*runifi(10))” all result in a response of “Unbelievable” and an average p-value of 0.047 … there is a problem with test … or with runif()…

  7. there is a bug in the program: it need remove leading zero from single digit numbers when checking the pattern. And I think it should reveal the number of ways it checks for the pattern (so that we can apply bonferroni in our mind). It would also interesting to see whether other multiple testing correction methods work here.

  8. Thanks Andy and all for comments. A couple of responses. It’s not a Fisher test but it is an exact test, and it is definitely fishy. It is fishy because it selects the specific test as a function of the data. There are multiple comparisons here, the point however is about fishing (not multiple comparisons) because the issue is not just a failure to correct for the many tests implemented, but a failure to report which tests were implemented. The point is that when people fish *you cannot do Bonferonni in your head.*

    I’ve done updates that respond more to out-of-range entries (treated as 0 probability events under the null) and also to catch
    ascending and descending sequences. Now only 11% of random sequences generate p values >10%. So if you come up with a sequence with a p value between 10% and 11% you are “right there” as D.O. said (except it turns out that only 1% of sequences are in that range, so even if you are right there you are not there).

    • Macartan:

      Great to see you in the blog. Just a quick comment that the concept of an “exact test” is rather ambiguous: https://en.wikipedia.org/wiki/Exact_test

      I suppose one could argue the test is exact because the null distribution is known _asymptotically_. But we are dealing with small samples, so not sure what to make of this. (It reminds me of incidental parameters problems where an estimator may be unbiased but not consistent (e.g. FE probit). But at this point I am speculating.)

  9. But people are so much better at fishing! A real fisherman would have picked a different pattern. Would this be useful at Turing test? — when a clearly non-random sequence is provided, humans and machines recognize different patterns?
    ===================================================
    Unbelievable!

    You chose the numbers 01 02 03 01 22 33 99 99 99 99

    But these are clearly not random numbers. We can tell because random numbers do not contain patterns but the numbers you entered show a fairly obvious pattern.

    Take another look at the sequence you put in. You will see that the number of times that two neighboring numbers are within 2 points of each other is: 6. But the `expected number’ from a random process is just 0.4. How odd is this pattern? Quite odd in fact. The probability that a truly random process would turn up numbers like this is just p=0 (i.e. less than 1%).

    Try again (with really random numbers this time)!

    ps: you might think that if the p value calculated above is high (for example if it is greater than 12%) that this means that the numbers you chose are not all that odd; but in fact it means that the numbers are really particularly odd since the probability that the fishy test would produce a p values above 12%, when really random sequences are used, is low (p<0.07). For more on how to fish see here.

  10. Pingback: Stethoscope as weapon of mass distraction - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science

  11. side note on generating ten random integers from {1,2,…,100} with R: sample.int(100,10,T) gets the job done nicely. I find that sample() and sample.int() are very useful functions in R.

Leave a Reply

Your email address will not be published. Required fields are marked *