“An exact fishy test”

Posted on September 27, 2014 9:13 AM by Andrew

Macartan Humphreys supplied this amusing demo. Just click on the link and try it—it’s fun!

Here’s an example: I came up with 10 random numbers:

> round(.5+runif(10)*100)
 [1] 56 23 70 83 29 74 23 91 25 89

and entered them into Macartan’s app, which promptly responded:

Unbelievable!

You chose the numbers 56 23 70 83 29 74 23 91 25 89

But these are clearly not random numbers. We can tell because random numbers do not contain patterns but the numbers you entered show a fairly obvious pattern.

Take another look at the sequence you put in. You will see that the number of prime numbers in this sequence is: 5. But the `expected number’ from a random process is just 2.5. How odd is this pattern? Quite odd in fact. The probability that a truly random process would turn up numbers like this is just p=0.074 (i.e. less than 8%).

Try again (with really random numbers this time)!

ps: you might think that if the p value calculated above is high (for example if it is greater than 15%) that this means that the numbers you chose are not all that odd; but in fact it means that the numbers are really particularly odd since the fishy test produces p values above 15% for less than 2% of all really random numbers. For more on how to fish see here.

36 thoughts on ““An exact fishy test””

Fernando on September 27, 2014 10:51 AM at 10:51 am said:

Andrew:

It seems to me they are performing a classical sampling-based test based on expectations, not an exact test based on permutations of given data.

If so, your title is a little misleading.

Reply ↓
- Andrew on September 27, 2014 11:04 AM at 11:04 am said:
  
  Fernando:
  
  It’s Macartan’s title (hence the quotes).
  
  Reply ↓
- Anonymous on September 27, 2014 2:14 PM at 2:14 pm said:
  
  permutation tests are susceptible to the same interpretational problems as classical sampling. the “is it random?” dichotomy is simply one of the most harmful pseudo-statistical concepts to have permeated the research community.
  
  Reply ↓
- Erin Jonaitis on September 28, 2014 6:38 AM at 6:38 am said:
  
  I dunno, I think it’s worth that price for the pun. :)
  
  Reply ↓
D.O. on September 27, 2014 1:24 PM at 1:24 pm said:

It really is amusing. My first try (well, second, the first one included number greater than 100 but applet dutifully included it in the summary. Feature or bug?) gave p=11% (neighboring digits), which, I guess, is right there. High enough to be considered “random”, but small enough not to be dismissed as “p-value is too high”.

Reply ↓
- Rahul on September 27, 2014 1:29 PM at 1:29 pm said:
  
  I got the p up to 32%. Did anyone get it any higher?
  
  Reply ↓
  - Macartan on October 1, 2014 4:31 PM at 4:31 pm said:
    
    0.32 is high. I put a million random sequences through the test and the highest p value generated was 0.326. 99.993% were below 0.32.
    
    A bit of human assistance gets the max up to 0.353 with this sequence: 79 47 01 93 23 58 64 81 15 62
    
    So I think this sequence is *the* most random set of 10 numbers possible. If you really want random, then you won’t do better than this.
    
    Reply ↓
    - Daniel Gotthardt on October 1, 2014 4:48 PM at 4:48 pm said:
      
      If they are actually using “no pattern” as a definition of randnomness, that reminds me of the “ugliest music possible”:
      
      http://www.ted.com/talks/scott_rickard_the_beautiful_math_behind_the_ugliest_music
      
      “Scott Rickard set out to engineer the ugliest possible piece of music, devoid of repetition, using a mathematical concept known as the Costas Array. In this talk, he shares the math behind musical beauty (and its opposite).”
    - Chris G on October 1, 2014 8:33 PM at 8:33 pm said:
      
      1. Interesting talk. “The Perfect Ping” didn’t strike me as ugly – not particularly appealing but not ugly either. Also, FWIW, way back when I dabbled in pseudorandom sequences for a very different application – http://www.opticsinfobase.org/ol/abstract.cfm?uri=ol-25-16-1162.
      
      2. I second Martha’s definition of random: “‘A set of n random numbers’ means (by definition) a set of n numbers chosen by a random process.”
H Forrest Alexander on September 27, 2014 8:28 PM at 8:28 pm said:

My “most random” contrived sequence: 4, 15, 27, 35, 47, 53, 69, 78, 83, and 98, with p=0.32. I was half-surprised not to see a complaint like, “the likelihood of a random sequence being monotonic is …”

Reply ↓
Marcos on September 28, 2014 12:13 AM at 12:13 am said:

I think this is just a ‘fishy’ test as it says. Like p-hacking. Any 10 digits you enter will have some pattern in them if you look for enough different patterns, because at the end of the day there are only 10 digits… not sure even the test is real…

Reply ↓
- Drew D on September 28, 2014 6:42 PM at 6:42 pm said:
  
  That’s the point…
  
  From the linked paper: “We demonstrate here that […] the scope for fishing is considerable when there is latitude over selection of covariates, subgroups, and other elements of an analysis plan.”
  
  Reply ↓
- Phil on September 29, 2014 1:26 AM at 1:26 am said:
  
  Yes, that’s the point. Every sequence of numbers has a pattern.
  
  Reply ↓
  - Phil on September 29, 2014 1:27 AM at 1:27 am said:
    
    oops, sorry, didn’t see Drew’s comment until after I posted mine, obviously.
    
    Reply ↓
Rahul on September 28, 2014 1:14 AM at 1:14 am said:

It sure is a cute demo but it’s a bit like insinuating a doctor’s stethoscope is useless by demonstrating ten ways in which it can be mis-used.

Reply ↓
Martha on September 28, 2014 1:30 AM at 1:30 am said:

The fallacy is in the phrase, “random numbers do not contain patterns”. That’s baloney — which might be easier to explain with flipping coins: You can get any of the patterns HHHTTT, HTHTHT, TTTHHH, THTHTH, HHTTHH, TTHHTT, HHHHHH, TTTTTT by flipping a fair coin six times. Numbers just have more possible patterns.

Another analogy: There are so many possible coincidences, that it’s almost certain that some “coincidence” will occur — e.g., it’s almost certain that a randomly chosen person was born on the birthday of some famous person. (If anyone claims they weren’t born on the birthday of a famous person, they just haven’t looked hard enough.)

Reply ↓
- Rahul on September 28, 2014 1:38 AM at 1:38 am said:
  
  Could you adjust for how many patterns you were looking for before you found one?
  
  Reply ↓
  - Martha on September 28, 2014 1:40 AM at 1:40 am said:
    
    Not sure what you’re looking for/suggesting here.
    
    Reply ↓
    - Rahul on September 29, 2014 10:06 AM at 10:06 am said:
      
      Something like Bonferroni….
    - Martha on September 29, 2014 3:20 PM at 3:20 pm said:
      
      It’s just a game! But if you insist on taking it more seriously than that, then I’d say FDR (False Discovery Rate) would be better than Bonferroni or another method for bounding the overall Type I error rate, since what is going on does seem to be a “search for discovery (of a pattern)” process.
- Simon on September 28, 2014 3:36 AM at 3:36 am said:
  
  Well, by definition random numbers do not contain patterns, so long as they are truly random and n->infinity. Your coin example is true, but unrelated. A true comparison would be comparing the distribution of randomly flipped coins to a series of coins flipped infinite times. Since in this scenario we are comparing the distribution of a set of 10 numbers to the distribution of the limit of a set of random numbers.
  
  Reply ↓
  - Martha on September 29, 2014 1:05 AM at 1:05 am said:
    
    “by definition random numbers do not contain patterns”??? Since when? “A set of n random numbers” means (by definition) a set of n numbers chosen by a random process. This can indeed produce a set of numbers with a pattern.
    
    Reply ↓
    - Daniel Lakeland on September 29, 2014 11:22 AM at 11:22 am said:
      
      It’s actually rather hard to define “a random process” the most reasonable definitions involve either compressibility (kolmogorov complexity) or an “ultimate, computable test” (Per Martin-Löf). With a small sample size (10 numbers) it´s basically impossible to reject the hypothesis that the numbers came from a uniform random number generator on the integers {1,2,3,…100}.
      
      Though it’s easy to find *a* test which will reject this hypothesis if you use enough tests.
      
      Note, the “ultimate computable test” can be proven to exist, but it’s non-constructive. Figuring out which test to use is most likely non-computable. Hence, we use some kind of battery of tests a-la “die hard” and associated follow-up tests. They are only useful for pseudo-random number generators that can generate millions or billions of numbers for testing.
Chris G on September 28, 2014 11:05 AM at 11:05 am said:

I think I’ll file that under “Why I am a Likelihoodist”. (See http://www.stat.ufl.edu/archived/casella/Talks/BayesRefresher.pdf – first chart after the title slide.)

PS I’d like to claim to be a Bayesian but maximizing the likelihood function is often all I have time for and the answer it gives me is usually good enough. I suppose I can claim that I’m a Bayesian in spirit, if not in practice.

Reply ↓
- Andrew on September 28, 2014 11:13 AM at 11:13 am said:
  
  Chris:
  
  You’ll do fine as long as you tread carefully when working with data that are weak compared to available prior information (as in the various “Psychological Science”-style studies we’ve been discussing a lot lately). You can do reasonable non-Bayesian analysis of such data, but you have to be careful, as you’ll have to regularize or control your inferences one way or another to avoid falling into the abyss.
  
  Reply ↓
  - Chris G on September 28, 2014 8:26 PM at 8:26 pm said:
    
    > …you’ll have to regularize or control your inferences one way or another to avoid falling into the abyss.
    
    Yes, in my previous job I spent a lot time figuring how to frame problems so that I can get a decent answer using ridge regression. The motivation was purely pragmatic. I was usually working with tight processing timelines – milliseconds, give or take. Ridge regression provided decent answers and met the timing requirements. (In that niche Patton’s maxim carried the day, “A good plan executed violently now is better than a perfect plan executed next week.”) In general, when deciding between H0 and H1 I’d rather have the Bayes factor than the likelihood ratio but if calculating the integrals to get the Bayes factor blows my timeline then a GLRT will suffice. (Actually, making the HO/H1 decision based on the GLRT alone usually wasn’t good enough – also needed to check the quality of the fit to the data under H1.)
    
    > You’ll do fine as long as you tread carefully when working with data that are weak compared to available prior information (as in the various “Psychological Science”-style studies we’ve been discussing a lot lately).
    
    One of things I enjoy about your blog is reading about people working problems which are extraordinarily different from the ones I work using tools which I have used or at least thought about using. The science is interesting and I like to believe that seeing people work problems in other areas gives me a better appreciation of the methods that I’m trying to apply.
    
    Reply ↓
Steve Rein on September 28, 2014 12:05 PM at 12:05 pm said:

All I have to say is that if my first five tries … by using “round(100*runifi(10))” all result in a response of “Unbelievable” and an average p-value of 0.047 … there is a problem with test … or with runif()…

Reply ↓
- Ben Bolker on September 28, 2014 6:38 PM at 6:38 pm said:
  
  Sorry if this is subtle irony on your part, but I think you may be missing the whole point here … ??
  
  Reply ↓
Eli Rabett on September 28, 2014 3:02 PM at 3:02 pm said:

Try these 66 59 50 50 42 34 28 23 28 14

Reply ↓
wei on September 29, 2014 9:37 AM at 9:37 am said:

there is a bug in the program: it need remove leading zero from single digit numbers when checking the pattern. And I think it should reveal the number of ways it checks for the pattern (so that we can apply bonferroni in our mind). It would also interesting to see whether other multiple testing correction methods work here.

Reply ↓
Macartan on September 29, 2014 12:52 PM at 12:52 pm said:

Thanks Andy and all for comments. A couple of responses. It’s not a Fisher test but it is an exact test, and it is definitely fishy. It is fishy because it selects the specific test as a function of the data. There are multiple comparisons here, the point however is about fishing (not multiple comparisons) because the issue is not just a failure to correct for the many tests implemented, but a failure to report which tests were implemented. The point is that when people fish *you cannot do Bonferonni in your head.*

I’ve done updates that respond more to out-of-range entries (treated as 0 probability events under the null) and also to catch
ascending and descending sequences. Now only 11% of random sequences generate p values >10%. So if you come up with a sequence with a p value between 10% and 11% you are “right there” as D.O. said (except it turns out that only 1% of sequences are in that range, so even if you are right there you are not there).

Reply ↓
- Fernando on September 30, 2014 3:26 PM at 3:26 pm said:
  
  Macartan:
  
  Great to see you in the blog. Just a quick comment that the concept of an “exact test” is rather ambiguous: https://en.wikipedia.org/wiki/Exact_test
  
  I suppose one could argue the test is exact because the null distribution is known _asymptotically_. But we are dealing with small samples, so not sure what to make of this. (It reminds me of incidental parameters problems where an estimator may be unbiased but not consistent (e.g. FE probit). But at this point I am speculating.)
  
  Reply ↓
JS on September 30, 2014 1:45 PM at 1:45 pm said:

Everybody is unique (“quite odd in fact” as Macartan puts it), like everybody else.

Reply ↓
JS on September 30, 2014 1:52 PM at 1:52 pm said:

But people are so much better at fishing! A real fisherman would have picked a different pattern. Would this be useful at Turing test? — when a clearly non-random sequence is provided, humans and machines recognize different patterns?
===================================================
Unbelievable!

You chose the numbers 01 02 03 01 22 33 99 99 99 99

But these are clearly not random numbers. We can tell because random numbers do not contain patterns but the numbers you entered show a fairly obvious pattern.

Take another look at the sequence you put in. You will see that the number of times that two neighboring numbers are within 2 points of each other is: 6. But the `expected number’ from a random process is just 0.4. How odd is this pattern? Quite odd in fact. The probability that a truly random process would turn up numbers like this is just p=0 (i.e. less than 1%).

Try again (with really random numbers this time)!

ps: you might think that if the p value calculated above is high (for example if it is greater than 12%) that this means that the numbers you chose are not all that odd; but in fact it means that the numbers are really particularly odd since the probability that the fishy test would produce a p values above 12%, when really random sequences are used, is low (p<0.07). For more on how to fish see here.

Reply ↓
Pingback: Stethoscope as weapon of mass distraction - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science
Nadabupkis on January 7, 2015 11:35 AM at 11:35 am said:

side note on generating ten random integers from {1,2,…,100} with R: sample.int(100,10,T) gets the job done nicely. I find that sample() and sample.int() are very useful functions in R.

Reply ↓

36 thoughts on ““An exact fishy test””

Leave a Reply Cancel reply