Skip to content

“Constructing expert indices measuring electoral integrity” — reply from Pippa Norris

This morning I posted a criticism of the Electoral Integrity Project, a survey organized by Pippa Norris and others to assess elections around the world.

Norris sent me a long response which I am posting below as is. I also invited Andrew Reynolds, the author of the controversial op-ed, to contribute to the discussion.

Here’s Norris:

Constructing expert indices measuring electoral integrity

Pippa Norris

Harvard University and University of Sydney

For the last five years, the Electoral Integrity Project, an independent research project based at Harvard and Sydney Universities, has conducted the Perceptions of Electoral Integrity global study. Following the November 8th 2016 elections, this method was applied to compare the electoral integrity of 50 U.S. States plus DC (PEI-US-2016), with the survey gathering responses from over 700 political scientists.

The results were published by the EIP team in two blog reports comparing electoral integrity across US states, with a longer piece published on 24 December in Vox and a shorter piece published on 27 December in the Monkey Cage. The results were also highlighted in commentary by Andrew Reynolds published on 22nd December in News and Observer “North Carolina is no longer classified as a democracy.” The release went viral in the media, spawning dozens of articles.

There has also been considerable interest among scholars, for example, since becoming available on 15th December, the PEI-US-2016 datasets have been downloaded via Dataverse over 1,100 times, which must be something of a record.

The PEI study raises some important general issues about the use and construction of expert indices, as well as the specific methods employed by the Electoral Integrity project. Andrew Gelman highlights several questions about the methods which are used. This note provides a brief response to both issues.

Constructing expert indices

Indices and datasets derived from expert surveys have become increasingly common in comparative social science, in risk analysis by private sector organizations, in evaluation research, and among NGOs and policy makers (Meyer & Booker 1991; Cooley and Snyder 2015). This data collection technique has been applied to diverse research topics such as the series of studies on party and policy positioning (Laver and Hunt 1992; Huber and Inglehart 1995; Saiegh 2009; Laver, Benoit, and Sauger 2006; McElroy and Benoit, 2007), the power of prime ministers (O’Malley 2007), evaluations of electoral systems (Bowler, Farrell, and Pettitt 2005); policy constraints horizons (Warwick 2005); campaign communications (Lileker, Steta and Tencher 2015); human rights and democracy (Landman and Carvalho 2010), and the quality of public administration (Teorell, Dahlstrom and Dahlberg 2011). Expert surveys have been widely used in research on corruption – the Corruption Perceptions Index (Transparency International 2013; Global Integrity); measuring democracy since the 1900s -Varieties of Democracy (Coppedge et al. 2011)- and electoral integrity (Norris, 2014; Norris, 2015; Martinez I Coma and Van Ham, 2015). The World Bank Institute Good Governance indicators combine an extensive range of expert perceptual surveys drawn from the public and private sectors. Indeed, among the mainstream indicators of democracy, Freedom House’s estimates of political rights and civil liberties, Polity IV’s classification of autocracies and democracies, and the Economist Intelligence Unit’s estimates of democracy are all, in different ways, dependent upon expert judgments.

Expert surveys seem especially useful for measuring complex concepts that require expert knowledge and evaluative judgments; and for measuring phenomena for which alternative sources of information are scarce (Schedler 2012). Yet, expert surveys are far from risk free and several scholars have pointed out their limitations (Budge, 2000; Mair 2001; Steenbergen and Marks, 2007). Moreover, in contrast to mass social surveys, a common methodology to construct such surveys is lacking, as well as agreed technical standards and codes of good practice. The most heated debate has focused on the pros and cons of methods used to evaluate the spatial positions of party policies, and about the use of governance indicators more generally.

Nevertheless, by contrast there has been remarkably little discussion about the challenges of validity, reliability, and legitimacy facing the construction of expert perceptual surveys. Yet it is critical to consider these issues given the lack of a clear conceptualization and sampling universe of ‘experts’, contrasting selection procedures and reliance upon domestic and international experts, variations in the number of respondents and publication of confidence intervals, and lack of consistent standards in levels of transparency and the provision of technical information.

Moreover, more research needs to be done on how to evaluate the consequences of expert and context heterogeneity on the validity of expert judgments (Martinez i Coma and van Ham 2015), for example by using item response models to test and correct for expert heterogeneity (Pemstein et al. 2015), and using techniques such as ‘anchoring vignettes’ (King & Wand 2007) or ‘bridge coders’ (V-Dem) to test and correct for context heterogeneity.

The Electoral Integrity Project

With these general points in mind, and to address the issues raised by Gelman more directly, what approach and techniques are used by the Electoral Integrity Project when constructing the Perceptions of Electoral Integrity index.

To start to gather new evidence, on 1st July 2012 the project launched the expert survey of Perceptions of Electoral Integrity. The design was developed in consultation with Professor Jorgen Elklit (Aarhus University) and Professor Andrew Reynolds (University of North Carolina, Chapel Hill).

Global Coverage:

The PEI survey of electoral integrity focuses upon independent nation-states around the world which have held direct (popular) elections for the national parliament or presidential elections. The criteria for inclusion are listed below. The elections analyzed in the most recent release (PEI-4.5) cover the period from 1 July 2012 to 30 June 2016. In total, PEI 4.5 covers 213 elections in 153 nations.[1] The next release (PEI-5.0), expanding coverage to elections held during the last 6 months of 2016, will be in March 2017.

Criteria for inclusion in the survey # Definition and source
Total number of independent nation-states 194 Membership of the United Nations (plus Taiwan)
Excluded categories    
Micro-states 12 Population less than 100,000 in 2013, including Andorra, Antigua and Barbuda, Dominica, Liechtenstein, Marshall Islands, Monaco, Nauru, Palau, Saint Kitts and Nevis, San Marino, Seychelles, and Tuvalu.
Without de jure direct (popular) elections for the lower house of the national legislature 5 Brunei Darussalam, China, Qatar,   UAE, and Saudi Arabia
State has constitutional provisions for direct (popular) elections for the lower house of the national legislature, but none have been held since independence or within the last 30 years (de facto) 3 Eritrea, Somalia, and South Sudan
Sub-total of nation-states included in the survey 174  
Covered to date in the PEI 4.5 dataset (from mid-2012 to mid-2016) 153 87% of all the subtotal of nation-states

Because of the selection rules, elections contained in each cumulative release of the PEI survey can be treated as a representative cross-section of all national presidential and legislative elections around the world (with the exception of the exclusion of micro-states). The countries in PEI 4.5 are broadly similar in political and socio-economic characteristics to those countries holding national elections which are not yet covered in the survey, although being slightly larger in population size.

More recently the EIP project has also collaborated with local teams of scholars and conducted several sub-national surveys using similar methods but at the level or provinces, states or other sub-national units, including India, the US, Mexico, the UK, and Russia. Thee PEI uses the identical core 49 items across all sub-national studies, to maintain comparability, but also supplements the core with specific items most relevant to each particular context, such as violence and crime in Mexico. The PEI has now been conducted three times in the US, in 2012 (at national level), in 2014 (covering 20 states) and 2016 (covering all states). When merged, this will allow comparison over time as well as across states.


For each country or state, the project identifies around forty election experts, defined as a political scientist (or other social scientist in a related discipline) who had demonstrated knowledge of the electoral process in a particular country (such as through publications, membership of a relevant research group or network, or university employment). It should be noted that this is far more than is conventionally used in comparable expert-based surveys, like V-Dem. For the global PEI, the selection has sought a roughly 50:50 balance between international and domestic experts, the latter defined by location or citizenship. Experts are asked to complete an online survey. In total, 2,417 completed responses were received in the PEI-4.5 survey, representing just under one third of the experts that the project contacted (29%). For the PEI-US-2016, the survey received over 700 responses.

It should also be noted that PEI contacts experts one month after each election, when judgments are likely to be stable and memories fresh. By contrast, other expert-based surveys ask respondents for judgments far longer from the event, for example V-Dem asks their experts about electoral integrity in each country for every year since 1900, which we believe is not possible to assess with any degree of accuracy.


The idea of electoral integrity is defined by the project to refer to agreed international conventions and global norms, applying universally to all countries worldwide through the election cycle, including during the pre-election period, the campaign, on polling day, and its aftermath. [2]

What needs to be emphasized is that this new concept is far from equivalent to standard notions of liberal democracy. It remains difficult for scholars to break out of the familiar way of classifying regimes but instead the notion of electoral integrity is derived from international conventions and standards based on human rights. This provides a less tight theoretical concept but one which is both legitimate and authoritative for international programs of electoral assistance.


To measure this concept, the PEI survey questionnaire includes 49 items on electoral integrity ranging over the whole electoral cycle. These items fall into eleven sequential sub-dimensions.

Most media attention in detecting fraud focuses upon the final stages of the voting process, such as the role of observers in preventing ballot-stuffing, vote-rigging and manipulated results. Drawing upon the notion of a ‘menu of manipulation’,[3] however, the concept of an electoral cycle suggests that failure in even one step in the sequence, or one link in the chain, can undermine electoral integrity.

Unlike many other summary indices, the results of the PEI survey can be broken down in far more granular detail to pinpoint specific weaknesses and strengthens in each contest. For example, the data can be used to compare how elections rate across eleven stages of the electoral cycle, and across 49 indicators, such as in the processes of district gerrymandering, the opportunities that contests provide for women and minority candidates, the provision of equitable access to political finance, the fairness of electoral officials, and the occurrence of peaceful and violent protests after the announcement of the results, and so on. This is essential for the correct diagnosis of any problems – and thus identifying the appropriate reforms needed to strengthen integrity.

The electoral integrity items in the survey were recoded, where a higher score consistently represents a more positive evaluation. Missing data was estimated based on multiple imputation of chained equations in groups composing of the eleven sub-dimensions. The Perceptions of Electoral Integrity (PEI) Index is then an additive function of the 49 imputed variables, standardized to 100-points. Sub-indices of the eleven sub-dimensions in the electoral cycle are summations of the imputed individual variables.[4]

It could be suggested that the items should be weighted, for example by whether constitutional provisions or laws limit party competition. Nevertheless, legal restrictions are only one dimension of the procedures used to exclude or narrow party competition; in most electoral autocracies, today, multiple parties exist but there is no level playing field. Ruling parties limit opportunities for opposition forces through multiple mechanisms, whether blatant gerrymandering, intimidation and repression, patronage largess and corruption, or control over state media and public resources. The problems in Ethiopia, for example, differ sharply from those in Syria, Belarus, Haiti and Burundi, all countries with elections rated at the bottom by experts. Since different mechanisms are used in different states, each of these needs to be evaluated, and any single ‘break in the chain’ undermines integrity. Moreover, analyzing electoral integrity even in countries where there is no constitutional or legal right to organize political parties, or whether there remains very limited party competition, also provides an important benchmark to evaluate future developments in subsequent contests.

Confidence intervals

When interpreting the results, it should be noted that modest differences in the PEI index are unlikely to be statistically significant at reasonable confidence intervals. It is more useful to focus on the range of indicators across the cycle and more substantial differences among elections or among countries. Confidence intervals are constructed at the 95 per cent interval for the summary PEI index, based on the number of experts who responded for each election and country.

Validity and reliability tests:

The results have been tested for external validity (from sources of independent evidence), internal validity (consistency within the group of experts), and legitimacy (how far the results can be regarded as authoritative by stakeholders). The analysis, presented elsewhere, demonstrates substantial external validity when the PEI data is compared with many other expert datasets, as well as internal validity across the experts within the survey, and legitimacy as measured by levels of congruence between mass and expert opinions within each country. [5]

For external validity tests, the PEI Index was significantly correlated with other standard independent indicators contained in the 2016 version of the Quality of Government cross-national dataset. This includes the combined Freedom House/imputed Polity measure of democratization (R=.762** N. 151), and the Varieties of Democracy measure of electoral democracy (polyarchy) (R=.824**, N.140).[6]

For internal validity purposes, several tests have been run with each release using OLS regression models to predict whether the PEI index varied significantly by several social and demographic characteristics of the experts, including sex, age, education, domestic and international institutional location, and familiarity with the election. In accordance with the findings from the previous versions, domestic experts and those reporting a higher level of familiarity with the election were significantly more positive in their evaluations, but other social characteristics were not significant predictors of evaluations.

The PEI-4 Codebook provides detailed description of all variables and imputation procedures. A copy can downloaded from the project website


The main PEI datasets are released on a bi-annual basis, as soon as they have been cleaned, so that they are available for secondary analysis by the community of users. The files are made available at country, election and expert levels along with the codebooks through the EIP Dataverse. They have been widely downloaded, for example the PEI-US-2016 attracted over 1,100 downloads in two weeks. A detailed report is also published bi-annually in an accessible format for practitioners and journalists. This transparency is important to allowing multiple tests beyond the capacity of the research team.

For data, go to:


In short, the project has made considerable progress in developing the PEI methodology over the last five years and we are confident about the results. Nevertheless, there is always room for improvement, and, in particular, learning from comparisons across similar projects is very helpful to create a community or network. To this end, we organized the workshops and panels last year with V-Dem at IPSA in Posnan and APSA in Philly, bringing together representatives from major organizations generating political indices, including Freedom House, Polity IV, the Bertelsmann Institute, International IDEA, UNDP, the Manifesto Project and CHES. This dialogue can only benefit the process of generating reliable and valid indices, identifying best practices, as well as making the methodology more transparent.

This is only one part of the EIP project and we generally adopt mixed methods where we employ both elite and mass survey (WVS) data, as well as selected case studies, and qualitative methods.

The interest in PEI- US also demonstrates the need for scholars to think carefully about how social scientist can contribute evidence which is useful in the public debate about how to identify problems in electoral integrity and, then, what solutions might be most appropriate to overcome these. We seem to be heading into a fact-free zone where partisans assert that the world is flat but social science can still serve an important function in speaking truth to power, generating evidence of poor (and good) performance, and contributing towards the public sphere.


Benoit, Ken and Michael Laver. 2005. Party Policy in Modern Democracies. London: Routledge.

Bowler, Shaun; David Farrel and Robin Pettitt. 2005. ‘Expert opinion on electoral systems: So which electoral system is best?’ Journal of Elections, Public Opinion and Parties 15(1): 3-19.

Budge, Ian. (2000). ‘Expert judgments of party policy positions: Uses and limitations in political research.’ European Journal of Political Research 37(1): 103–113.

Transparency International. 2013. Corruption Perception Index.

Huber, John and Inglehart, Ronald. 1995. ‘Expert Interpretations of Party Space and Party Locations in 42 Societies’, Party Politics 1:73-111.

King,G. &Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis 15(1): 46–66.

Landman, Todd and Edzia Carvalho. 2010. Measuring Human Rights. London: Routledge.

Laver, Michael and Ben Hunt, B. (1992). Party and Policy Competition. London: Routledge.

Laver, Michael, Kenneth Benoit, and Nicolas Sauger. 2006. ‘Policy Competition in the 2002 French Legislative and Presidential Elections.’ European Journal of Political Research 45: 667-697.

Lilleker, Darren., Stetka, V. and Tenscher, J., 2015. Towards hypermedia campaigning? Perceptions of new media’s importance for campaigning by party strategists in comparative perspective. Information, Communication and Society, 18 (7), 747-765.

Mair, Peter. (2001). ‘Searching for the position of political actors: A review of approaches and a critical evaluation of expert surveys.’ In M. Laver (ed.), Estimating the policy positions of political actors. London: Routledge.

Martinez i Coma, Ferran; Van Ham, Carolien . 2015. ‘Can experts judge elections? Testing the validity of expert judgments for measuring election integrity.’ European Journal of Political Research 54 (2): 305-325.

McElroy, Gail and Kenneth Benoit. 2007. ‘Party groups and policy positions in the European Parliament.’ Party Politics 13:5-28. Meyer, M. & Booker, J. (1991). Eliciting and analyzing expert judgment: A practical guide. London: Academic Press.

Norris, Pippa. 2014. Why Electoral Integrity Matters. NY: CUP.

Norris, Pippa. 2015. Why Elections Fail. NY: CUP.

Norris, Pippa. 2017. Strengthening Electoral Integrity. NY: CUP.

Norris, Pippa, Richard W. Frank and Ferran Martinez i Coma. 2014. Eds. Advancing Electoral Integrity. New York: Oxford University Press.

Norris, Pippa, Richard W. Frank and Ferran Martinez i Coma. 2015. Eds. Contentious Elections. NY: Routledge.

Norris, Pippa and Andrea Abel van Es. 2016. Checkbook Elections? NY: OUP.

Norris, Pippa, and Alessandro Nai. Eds. 2017. Election Watchdogs. NY: OUP.

O’Malley, Eoin. 2007. The Power of Prime Ministers: Results of an Expert Survey’ International Political Science Review 28(1):7-27.

Pemstein, D.; Tzelgov, E.; Wang, Y. 2015. “Evaluating and Improving Item Response Theory Models for Cross-National Expert Surveys”. Varieties of Democracy Institute: Working Paper Series No. 1.

Saiegh, Sebastian. 2009. ‘Recovering a basic space from elite surveys: Evidence from Latin America.’ Legislative Studies Quarterly 34(1):117-145.

Schedler, Andreas. 2012. ‘Judgment and Measurement in Political Science’ Perspectives on Politics 10(1):21-36.

Steenbergen, Marco R. and Gary Marks. 2007. ‘Evaluating expert judgments.’ European Journal of Political Research 46: 347–366.

Teorell, Jan. Carl Dahlström & Stefan Dahlberg. 2011. The QoG Expert Survey Dataset. University of Gothenburg: The Quality of Government Institute.

Warwick, Paul. 2005. ‘Do Policy Horizons Structure the Formation of Parliamentary Governments?: The Evidence from an Expert Survey’ American Journal of Political Science 49(2):373-387.

[1]     In addition, in 2014 elections in Haiti, Lebanon, and Comoros were delayed or suspended. Those are thus not included in the dataset. The election in Thailand was held and later annulled. There were also elections in North Korea and Trinidad and Tobago, but with too few responses in these two cases meant that these are excluded from the dataset.

[2]     Pippa Norris. 2013. ‘The new research agenda studying electoral integrity.’ Special issue of Electoral Studies 32(4).

[3]     Andreas Schedler. 2002. ‘The menu of manipulation.’ Journal of Democracy 13(2): 36‐50.

[4]     See the codebook for further information.

[5]     Pippa Norris, Ferran Martinez i Coma and Richard Frank. 2013. ‘Assessing the quality of elections.’ Journal of Democracy. 24(4): 124-135; Pippa Norris, Richard W. Frank and Ferran Martinez i Coma. 2014. Eds. Advancing Electoral Integrity. New York: Oxford University Press; Ferran Martínez i Coma and Carolien Van Ham. 2015. ‘Can Experts Judge Elections? Testing the Validity of Expert Judgments for Measuring Election Integrity’. European Journal of Political Research doi:10.1111/1475-6765.12084; Pippa Norris, Richard W. Frank and Ferran Martínez i Coma. 2014. ‘Measuring Electoral Integrity around the World: A New Dataset’ PS: Political Science & Politics, 47(4): 789-798.

[6]     Jan Teorell, Stefan Dahlberg, Sören Holmberg, Bo Rothstein, Felix Hartmann and Richard Svensson. January 2016. The Quality of Government Standard Dataset, version Jan16. University of Gothenburg: The Quality of Government Institute,


  1. Nick says:

    “There were also elections in North Korea and Trinidad and Tobago, but with too few responses in these two cases meant that these are excluded from the dataset.”

    But if you look at the Year in Elections 2014 p.36, you see that Mauritania, Slovenia and North Korea all had 2 respondents for a 6% response rate, yet only North Korea was dropped from The Year in Elections 2015. There were more countries with only three or four respondents in there too.

  2. Untenured and thus anonymous says:

    This is a joke, right?

    At no point does Pippa answer any of the criticisms raised regarding her half-decade long project (for, example, the complete lack of face validity). The only two mentions of North Carolina are to Reynold’s place of employment, and North Korea is not mentioned save for a passing aside in a footnote.

    Furthermore, in doing so, it seems there is a complete lack of understanding of what contemporary “best practices” are even in the rather sad realm of expert surveys in political science: references are made to V-Dem, but at no point does it appear there is a recognition that additive indices of latent political phenomenon have been shown for decades to be highly problematic at best, and fundamentally flawed at worst.

    The conclusion is, perhaps, my favorite part: “In short, the project has made considerable progress in developing the PEI methodology over the last five years and we are confident about the results. Nevertheless, there is always room for improvement, and, in particular, learning from comparisons across similar projects is very helpful to create a community or network.”

    One makes the following inferences. First, we’ve made progress developing a methodology (which completely lacks face validity, but hey it’s developing!). Second, we’re confident about the results (even though they fly in the face of pretty much everything comparativists know about elections across the globe). Third, we’re creating a community and networking, which is, after all, so much more important for getting publications and citations than actually developing anything resembling a decent measure.

    If this is the best an entire research team can do when people point out the shambles that is their measure, it is most telling. The emperor, as usual, has no clothes.

  3. Pippa Norris says:

    We dropped North Korea in 2015 because of the respondents, which is I guess what you suggest that we should do, so I am really unsure why you continue to flog this dead horse.

    • Andrew says:


      My concerns about North Korea are twofold: First that you and your colleagues released that earlier report in which North Korea was rated over 50 on all dimensions for 2014, and that didn’t seem to be a problem at the time. Second that the North Korea numbers were created using the same approach as used for all other countries. Given that we can all agree that North Korea’s numbers were problematic, this calls into question the method more generally.

    • Rahul says:


      Allow me to offer an example. Say, I am a doctor who designs an ER triage protocol i.e. Based on symptoms a nurse can decide which patients need the most urgent attention.

      Now someone points out, that this algorithm gives the symptoms of a torn ACL higher priority than an apparent heart attack.

      Can I now just brush the criticism over by adding a Disclaimer: “This triage algorithm does not work correctly for heart attacks”?

  4. Nick says:

    “We dropped North Korea in 2015 because of the respondents, which is I guess what you suggest that we should do, so I am really unsure why you continue to flog this dead horse.” So you’re now admitting you dropped North Korea because the respondents gave ridiculous answers? Doesn’t doing that call into question the scientific basis of the entire survey concept? And how is straight up deleting an inconvenient data point good scientific ethics?

  5. Andrew says:


    Also I want to thank you for commenting here. We disagree on some of these measurement issues but I think the way to more forward scientifically is through open discussion, and I appreciate your openness in posting your reports and data online, and in engaging with critics here.

    • Andrés Ceballos says:

      I agree. Thank you so much for starting this debate. If only the academy could come down from the Ivory Tower and meet activists down at t he grass roots to evaluate the state of our democracy, we may one the one hand put these theories to work and at the same time get feedback from all those experts working on the ground. An organization in Colombia called the Electoral Observation Mission has a 35 variable index to measure their democracy. However the formula is fluid because political circumstances are always changing, specially in the midst of a peace process. The point is that these methods are only useful when they contribute to more transparent elections, to better media coverage and more informed citizens. Sometimes academic rigor will suffer in the name of political expediency, but that is an issue we need to deal with slowly.

  6. Nero says:

    “Validity and reliability tests:” looks reasonable, with no apparent large mistakes? Nevertheless, some terrible results were obtained with these reliable & valid measurements. Is there a basic problem with the approach used for assessing validity and reliability, which also calls into question validity and reliability checks in other studies? Or, does R=0.8 simply mean that in some cases extremely wrong assessments of single nations will quite necessarily occur?

  7. Erik Moeller says:

    I agree it would be good to know a bit more about circumstances under which responses are dropped (is there a response rate threshold?). Cuba is another example where the data at face value contradicts what’s well-known: the National Assembly elections have one candidate per seat due to the way the nominating process works; significant political dissent is prohibited. That’s not a system with high integrity. So why does it receive a rating in the 60s? Is the problem with the criteria used in the index, or is it with the experts consulted?

    Also, shouldn’t experts have demonstrated knowledge in more than one country? It’s difficult to estimate the fairness of a process if you’re only familiar with how your country is doing it. And of course there’s the hairy issue of the level of academic freedom in a country, which may significantly affect responses from experts in that country.

  8. Tom Passin says:

    “domestic experts and those reporting a higher level of familiarity with the election were significantly more positive in their evaluations”

    This statement really bothers me, partly because it wasn’t reported that anything was done with the information. Should domestic experts have been downweighted? Or maybe upweighted? Does it mean that domestic North Korean experts were more relied on to assess N. K. election processes? Isn’t it likely that in states with State-directed elections that the domestic experts would be more favorable than outsiders? Or if not, how can that be reliably established?

    What I read in the description of how the index is constructed is that reliance is placed on aggregation. But to increase the scientific reliability, one should be looking hard at the exceptions. N.K. is one prominent one, but clearly not the only one. Excluding it doesn’t resolve the matter. Why did it give what most would consider spurious results? And if it couldn’t be flagged as spurious until after 2014, why should we have much confidence now in the current version of the index? And how can we now tell which remaining countries in the index have similarly spurious results?

  9. Tom Passin says:

    To provide a little wider look, the published spreadsheets do provide both confidence levels (at least for some of the indexes) and number of responders. Some U.S. states are given very wide confidence bands (around 43-83, for one example), and so are North Korea and Cuba (in the 2014 results). There is a relation between a low number of responding experts and the confidence bands, but it’s not a simple one. Still, all the cases of very wide confidence bands I noted had only 2 or 3 people responding to the survey. (These seem to be confidence bands for one of the sub-indices; I’m still trying to understand what all of the spreadsheet columns represent).

    I suppose that if we weighted the mean values by the inverse squares of the confidence intervals, we wouldn’t put much stock in results like the N.K. and Cuba values. Nor, apparently, in the results for certain of the U.S. states either.

  10. Jonathan says:

    Why not exclude those where upper houses aren’t elected? Britain has the unelected Lords and they bluntly stuck their noses into many issues over the last year, let alone the past few years.

  11. Jonathan (another one) says:

    “We seem to be heading into a fact-free zone where partisans assert that the world is flat but social science can still serve an important function in speaking truth to power, generating evidence of poor (and good) performance, and contributing towards the public sphere.” Sure, or it can be used to *exacerbate* the problem of false claims. Dr. Norris fails to answer a simple question: does she think PEI supports (worse still, forms the entire basis of) the claim North Carolina is not a democracy? If she does, is the fact-free fantasy she proposes to correct the widespread notion that it is?

  12. Jon says:

    Wouldn’t any expert from North Korea who would make his country look bad be kept from being in a position to submit?

  13. Fafa says:

    I’m glad to see that Dr. Norris has provided some input, but this lengthy restatement of the methodology of the PEI is, at best, a heavily footnoted non-sequitur, and, at worst, a delusory smokescreen.

    Dr. Norris writes, “Andrew Gelman highlights several questions about the methods which are used. This note provides a brief response to both issues”. Yet no response is forthcoming to the key questions:
    (a) Given the results obtained thus far, can indexing expert opinion in this way provide any useful summary information on comparative or absolute measures of electoral reliability or democratic function?

    (b) If the answer to (a) is ‘yes’, how can the results with respect to (e.g.) North Korea be justified? In what sense does the ‘upward bias’ of the N. Korea outcome represent a systemic issue that explains the results from other countries? etc….

    To say, without comment or elaboration, that “domestic experts and those reporting a higher level of familiarity with the election were significantly more positive in their evaluations”, but later assert that somehow the PEI is helping social scientists ‘speak truth to power’ is an abdication of responsibility. Perhaps start with speaking truth to one’s own results by plainly discussing the implications of the threats to validity.

    Again, I’m happy that Dr. Norris commented, but the substance of the comment is a giant red flag.

    • Old Europe mixed-methods chap says:

      I’m afraid that I agree with the two questions formulated in that statement.

      Someone commented on Andrew’s previous post to say that the kind of rating exercise performed by the EIP team needs some kind of adjustment (via relative indexation) to become somewhat accurate. Pippa Norris answered to say that this method has its own flaws, yet she has not established the superiority of the unadjusted measurements. Worse, the project documentation does not even indicate what adjustment methods were considered, if any.

      That is sloppy at best, and it will be used in the worst ways by both hardcore ‘quants’ and hardcore ‘quals’ to dismiss the entire field of research that EIP fits in.

  14. Pippa Norris says:

    Dear Colleagues

    We have published about our methods over the years and obviously a short blog post cannot cover all these points. I welcome the interest in our work, however, as I have been actively trying to further discussion about the construction and use of expert indices, including with IPSA and APSA workshops I organized last year. As well as the details in the technical appendices in our annual reports and dataset codebooks, let me point you towards publications where you can read about EIP’s methods in more depth by colleagues in the team, including discussions of cross-national comparability, reliability and validity checks, and also see some of the papers we brought together to discuss these issues in our meetings:

    Norris, Pippa, Richard W. Frank and Ferran Martinez i Coma. 2014. Eds. Advancing Electoral Integrity. New York: Oxford University Press. Chapter 3 and 4.

    Norris, Pippa. 2014. Why Electoral Integrity Matters. New York: Cambridge University Press. Chapter 3

    Norris, Pippa. 2015. Why Elections Fail. New York: Cambridge University Press. Chapter 2

    Norris, Pippa, Ferran Martínez i Coma, and Richard W. Frank. 2013. ‘Assessing the quality of elections.’ Journal of Democracy. 24(4): 124-135.

    Norris, Pippa. 2013. ‘Does the world agree about standards of electoral integrity? Evidence for the diffusion of global norms.’ Special issue of Electoral Studies 32(4):576-588.

    Norris, Pippa. 2013. ‘The new research agenda studying electoral integrity’. Special issue of Electoral Studies 32(4): 563-575.

    Norris, Pippa, Richard W. Frank and Ferran Martínez i Coma. 2014. ‘Measuring electoral integrity: A new dataset.’ PS: Political Science & Politics47(4): 789-798.

    Martínez i Coma, Ferran and Carolien Van Ham. 2015. ‘Can experts judge elections? Testing the validity of expert judgments for measuring election integrity’. European Journal of Political Research 54(2) 305-325. doi:10.1111/1475-6765.12084.

    Here are the papers from the one-day workshop we organized with V-Dem in Philly with APSA 2016 on the topic of expert indices:

    Here are links to the panel we organized with V-Dem in Poznan last year on measuring and using expert indices. These papers are currently under review.

    1. Aggregating to Democracy: Generating Indices from Expert-Coded and Paired Comparison Data
    Prof. Svend-Erik Skaaning, Dr. Brigitte Zimmerman, Dr. Michael Coppedge, Dr. Staffan Lindberg

    2. Do experts judge elections differently in different contexts? The cross-national comparability of expert judgments on election integrity
    Dr. Carolien Van Ham, Dr. Ferran Martinez i Coma

    3. Complementary Strategies of Validation: Assessing the Validity of V-Dem Corruption Measures
    Prof. Jan Teorell, Dr. Kelly McMann, Dr. Brigitte Zimmerman, Dr. Dan Pemstein

    4. Do experts know how much they know? Do statistical models? Do we care?
    Dr. Kyle Marquardt, Dr. Dan Pemstein, Mr. Eitan Tzelgov

    Link to download via

    I hope that you find these readings useful for further information and we always warmly welcome constructive suggestions to improve our work. It has been striking to me how the use of these expert-based measures has expanded by leaps and bounds in the social sciences and yet the methodological discussion has lagged far behind. Let’s work together on these issues as a community of scholars.

    • darosenthal says:

      That response was a horrid slog. I’ve rarely encountered a more densely compacted layer of polysyllabic obfuscation in the service of clarification than what I’ve read here. I feel like a man who, knee deep in quicksand, has been handed an anvil. Everyone involved with this nonsense should be sent to a remedial communication camp for a semester of hard writing.

      • Rahul says:

        It is no worse than this, their fundamental definition of what they are trying to measure:

        “The idea of electoral integrity is defined by the project to refer to agreed international conventions and global norms, applying universally to all countries worldwide through the election cycle, including during the pre-election period, the campaign, on polling day, and its aftermath.”

        If that’s how they define “electoral integrity” the project is doomed from the start.

    • Nick says:

      Let’s look at the paper ‘Do experts judge elections differently in different contexts? The cross-national comparability of expert judgments on election integrity’. The authors look at data from the pilot study for PEI and note that some countries (including some of the more *surprising* scores like Kuwait and Romania) have higher expert disagreement than others but never discuss the obvious implication – if experts disagree, then how is data received from these tiny sample sizes accepted by PEI (median of n=11 but some elections have n=2 or n=3) statistically meaningful? Would a journal publish a survey on the electoral integrity of Cuba that had n=3 respondents? But apparently if you staple together 213 such surveys then they do.

    • Mark P. says:

      Look, you still haven’t acknowledged that you messed up and that North Korea in no way deserves the ratings that it got. You keep trying to toss shovelfuls of irrelevancy over the basic issue, but that basic issue refuses to be buried.

    • Raina says:

      For pete’s sake, Pippa. I’m an academic. This response (and the one above) might fly in a review response, but it’s not going to do anything for the public but make them dismiss you. No one cares how many reams you and your colleagues have published if you can’t BRIEFLY explain and defend the basic principles of your work to non experts.

      • Tyler says:

        Exactly, just answer the questions! I’m a researcher and epidemiologist and have done both qualtitative and quantititave research. I looked for anything resembling a response to Gelman’s article, and there is none. It took a lot of words and effort to provide a non-answer.

    • Andrew says:


      Damn! And this happened after my post. That’s just terrible: millions of people will read that New York Times article.

      Now I have to partially retract my P.S. here, where I wrote:

      But the good news is that the usual suspects such as ABC, NBC, CBS, CNN, NPR, BBC, NYT didn’t fall for it. I give these core media outlets such a hard time when they screw up, and they deserve our respect when they don’t take the bait on this sort of juicy, but bogus, story.

      Let’s just hope NPR isn’t next.

      On the plus side, Slate ran my article even thought it directly contradicted something they’d posted earlier. I’ll email Eduardo Porter and see if he can run a correction tomorrow.

  15. Eli Poupko says:

    I would argue that the methodological error here can be traced to the second excluded category, namely states “without de jure direct (popular) elections for the lower house of the national legislature.” Instead, the survey should arguably have excluded states without de facto popular elections. This is admittedly somewhat more difficult to measure, but I don’t think excluding only states lacking de jure elections (and including those that clearly lack de facto popular–i.e. minimally democratic–elections) is justified in this analysis, especially if it going to be used to draw comparative conclusions about democratic legitimacy in different states.

    • Nick says:

      Shouldn’t a useful measure be able to distinguish fake elections in a dictatorship from slightly flawed elections in a non-dictatorship?

    • Andrew says:


      That’s one error. But another error, I think, is the reliance on experts. What does it mean if you ask 40 experts and only 2 or 3 respond?

      There are a couple of ways of looking at this.

      One perspective is to say that an expert is an expert, and even one expert is enough to tell us what’s going on in the state or country. But if that’s the case, why survey 40 per country? Why not just get one or two in each, and stop there?

      The other perspective is to recognize the responses as subjective and variable. But if that’s the case, the issue isn’t just getting a large enough N to get a small standard error (as it seems Norris is implying). Once you accept the extreme subjectivity of responses (e.g., respondents who think North Korea has above-average electoral integrity or who think that North Carolina is less democratic than Cuba), then you have to be concerned about bias, both in the responses and in the sample. What’s the population being surveyed, are the respondents representative of that population, are they giving valid responses, etc?

      I didn’t go into the above points in my original post on this survey because it all seemed so obvious. But I guess that’s the problem with our statistics and methods teaching: we go into mind-numbing detail on variance formulas etc., but not enough on basic principles of measurement.

Leave a Reply