The Devil is in the Digits

Posted on June 20, 2009 4:44 PM by Andrew

Bernd Beber and Alex Scacco present another quantitative analysis of the Iranian election data, this time looking at last digits. They write:

[Suspicions of fraud] have led experts to speculate that the election results released by Iran’s Ministry of the Interior had been altered behind closed doors. But we don’t have to rely on suggestive evidence alone. We can use statistics more systematically to show that this is likely what happened. Here’s how.

We’ll concentrate on vote counts — the number of votes received by different candidates in different provinces — and in particular the last and second-to-last digits of these numbers. For example, if a candidate received 14,579 votes in a province (Mr. Karroubi’s actual vote count in Isfahan), we’ll focus on digits 7 and 9.

This may seem strange, because these digits usually don’t change who wins. In fact, last digits in a fair election don’t tell us anything about the candidates, the make-up of the electorate or the context of the election. They are random noise in the sense that a fair vote count is as likely to end in 1 as it is to end in 2, 3, 4, or any other numeral. But that’s exactly why they can serve as a litmus test for election fraud. For example, an election in which a majority of provincial vote counts ended in 5 would surely raise red flags.

Why would fraudulent numbers look any different? The reason is that humans are bad at making up numbers. Cognitive psychologists have found that study participants in lab experiments asked to write sequences of random digits will tend to select some digits more frequently than others. . . .

The ministry provided data for 29 provinces, and we examined the number of votes each of the four main candidates — Ahmadinejad, Mousavi, Karroubi and Mohsen Rezai — is reported to have received in each of the provinces — a total of 116 numbers.

The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran’s provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. . . .

Psychologists have also found that humans have trouble generating non-adjacent digits (such as 64 or 17, as opposed to 23) as frequently as one would expect in a sequence of random numbers. To check for deviations of this type, we examined the pairs of last and second-to-last digits in Iran’s vote counts. On average, if the results had not been manipulated, 70 percent of these pairs should consist of distinct, non-adjacent digits.

Not so in the data from Iran: Only 62 percent of the pairs contain non-adjacent digits. This may not sound so different from 70 percent, but the probability that a fair election would produce a difference this large is less than 4.2 percent. And while our first test–variation in last-digit frequencies–suggests that Rezai’s vote counts are the most irregular, the lack of non-adjacent digits is most striking in the results reported for Ahmadinejad.

I don’t know if I’d go so far as to say that these findings “leave very little room for reasonable doubt”–but I agree that they tell a coherent story that is consistent with manipulation of the vote findings, especially taken together with Walter Mebane’s recent calculations using a slightly different set of numbers. More exploration is needed. I did a crude look at the last digits of the numbers in one of Mebane’s files, and I didn’t see any pattern, but I wouldn’t say I did anything like a careful check.

P.S. More here.

4 thoughts on “The Devil is in the Digits”

ZBicyclist on June 20, 2009 3:51 PM at 3:51 pm said:

More 7's in the beginning (earlier post), more 7's at the end.

There's an old trick that says "think of a number between 1 and 10" — 7 is the most common answer, far more than 1 chance in 10.
Bill Harris on June 21, 2009 6:23 PM at 6:23 pm said:

Interesting analysis. I'm curious: would it be helpful to run it on other elections, including some strongly believed to be honest and others strongly believed to be rigged? Might that help test the effectiveness of this as a test?
Zach on June 22, 2009 4:02 AM at 4:02 am said:

A coherent but misleading story. The non-adjacent numbers bit is interesting because of its connection to behavioral psychology. The number frequency observation is not indicative of fraud. There's a 3.5% chance of those frequencies in the last digit, sure, but there's also a 3.5% chance of those numbers in the second-to-last digit. There's a 3.5% chance that two numbers will appear too frequently above some threshold, and a 3.5% chance that two numbers will appear infrequently. These occurrences could happen at one digit or be spread amongst multiple digits. Three numbers could appear too frequently/infrequently. It would be much more unlikely for all of the digits to appear 11 or 12 times (10%). And on and on; there's a 100% chance that something equally improbable and equally simple to explain will occur.

Tellingly, the 2008 elections returns numbers for Obama/McCain (from the Wikipedia entry) that are used here are actually a good counterexample upon further investigation. For the second-to-last digit (for the set of 102 Obama/McCain vote totals), 7 appears 20% of the time and 8 only 5% of the time. The odds of this happening randomly (1.5%) are lower than the event observed in this article (3.5%).

Did the authors hypothesize that a fair election would lack a frequent and infrequent number for the last digit before looking at the results, or would they have been equally satisfied with any of the hundreds of equivalent events?

I think you recognize that the reasoning here is flawed from your comments, and I don't know why you are passing it along without criticizing the methods.

That said, I think a new election is needed in Iran; if the election was fair (well, not completely fraudulent), Ahmadinejad would win again anyway.
Johannes Hüsing on June 22, 2009 4:42 AM at 4:42 am said:

Yes, it may allow you to formulate more specific alternative hypotheses, such as singling out the frequency of '7's.

Obviously, specific tests will be unable to detect the work of fraudsters afraid of them, so I smell a cat-and-mouse game here.

Comments are closed.