Using Nationwide Voter Files to Study the Effects of Election Laws

By Bernard Fraga, John Holbein, and Christopher Skovron 07.27.2018

The MIT Election Data and Science Lab helps highlight new research and interesting ideas in election science, and is a proud co-sponsor of the Election Sciences, Reform, & Administration Conference (ESRA).

Bernard L. Fraga, John B. Holbein, and Christopher Skovron recently presented a paper at the 2018 ESRA conference entitled, “Using Nationwide Voter Files to Study the Effects of Election Laws.” Here, they summarize their analysis from that paper.

Voter file-derived datasets have become a common source of information used by researchers studying voting behavior. Despite the various advantages offered by such data, however, survey-based analyses continue to dominate the study of election laws.

While concerns about cost and availability are paramount for researchers who are considering the use of voter file data, less attention has been paid to the methodological advantages and challenges of using voter lists for elections research. In this paper, we outline the various methodological considerations encountered when using voter file data to estimate the causal effect of state-level election laws on voter turnout.

In our paper, we start by providing an outline for the current state of literature on the effects of election laws on turnout; in particular, we pay close attention to the data sources used in these studies. Surveys continue to dominate the study of election laws, while the broader study of turnout is in the midst of a transition to analyses based on voter registration lists or “voter files.” (For a full list of references and examples of the literature we cite, please seen the original paper.)

There are many reasons for this transition, including — but not limited to — issues with using self-reported rather than validated voting, the difficulty of examining subgroups of interest even with large surveys, and the trend toward design-based inference (including experimental methods) when studying voter behavior. Voter file-based analyses may be particularly appropriate for the study of election laws and turnout, as state or county-level treatments further necessitate attention to these issues.

The shift to voter file-based analyses helps to account for some well-known problems in survey-based research, including misreporting (generally over-reporting) voter turnout, which is broadly understood to have a substantial impact on survey-based estimates of not only how many individuals participate but also what factors predict participation. Attempts to validate self-reported measures through matching respondents with voter file data are often seen as one way of dealing with this issue; for example, as in Hajnal, Lajevardi and Nielson (2017). However, even when high-quality surveys like the American National Election Studies or Cooperative Congressional Election Studies are matched to voter records, issues remain. More broadly, survey data is usually cross-sectional; this makes the estimation of valid causal effects, especially cross-state effects, extremely difficult.

Voter files themselves are no panacea. Voter file-based studies are often restricted to the one or few states where voter file data is available and not prohibitively expensive. They also often rely on third-party vendors who generally provide samples from their database or aggregate counts at a higher level than the individual voter. However, it is likely that the broader academic community will soon have more consistent access to complete nationwide voter files. In our paper, we argue that nationwide voter files offer many opportunities to produce new insights in the study of the effects of election laws on turnout, but that analysts need to be aware of important concerns and limitations when working with these new data sources.

We introduce these concerns while describing the features of a national, comprehensive voter file compiled by the Data Trust, a data vendor. We provide information about the quality of the dataset and its accompanied measures, both modeled and from administrative sources, through accessing the individual-level data on over 200 million registered voters. Our checks reveal that the Data Trust voter file provides estimates of total registered voters that track well with other vendors’ accounts and reasonably precise estimates of total votes cast in states going back to 2006.

A major limitation of voter file data, in comparison to surveys, is that not all states’ voter files record respondents’ race/ethnicity. Instead, vendors like Data Trust model race using voters’ names and other information about them. We find that these modeled race estimates correlate very strongly with the Census’ race estimates at the state and county level. We also propose methods for academic researchers to independently model voters’ race in commercial data in lieu of relying on vendors’ proprietary modeling.

We also analyze missingness by state on variables that are important for studying turnout, including registration date and birth date. We note potential problem areas but find that, overall, the data seem to be valid enough to provide accurate estimates once some of these missing values are taken into account. We propose our data checks as a non-exhaustive but necessary set of checks for voter file data before analysts use it in academic analyses.

We conclude by proposing a set of methods that are well-suited for use with national voter file data to study election laws and turnout. The large size of our data and our individual-level, over-time structure allows us to propose methods that are not appropriate for smaller datasets or for cross-sectional data. First, we suggest that researchers use the large size of the data to facilitate studies of smaller population subgroups, such as narrow intersections of race and age. In doing so, we encourage researchers to create pre-analysis plans and preregister their hypotheses to avoid fishing for significant results or hypothesizing after results are known. Tests for multiple comparisons are also appropriate in this context. In addition, analysts can use the over-time nature of voter file data to expand this analysis by including flexible time trends for each state.

We also suggest that researchers expand beyond the traditional difference-in difference models used to analyze state-level policy change, in order to exploit the individual-level panel nature of voter file data by including models with individual fixed effects. By leveraging individual changes in turnout over time, this model will allow us to rule out a host of unobservable characteristics that are potentially not captured by two-way fixed effects or state time trends.

The large nature of the dataset also facilitates precise individual-level matching analyses. The fact that voters are geolocated opens the possibility of using geographic regression discontinuity designs to estimate (under appropriate assumptions) the effects of policies by comparing voters who live near borders.

Voter file data is not perfect. Voter files only contain registrants, and the fact that some election laws may affect both turnout and registration patterns means that biased estimates of treatment effects are possible with the naive use of voter file data. To assess the extent of differential registration bias, researchers who are using voter files to estimate election law effects should look explicitly at registration rates as an outcome, and consider using sensitivity tests to assess the likelihood that differential registration bias affects their results.

Our paper sets an agenda for the academic study of election laws’ effects on turnout to take advantage of nationwide voter files, while ensuring that analysts are transparent about the validity of their data and the assumptions underlying their statistical tests. Nationwide voter files offer tremendous value in addressing some of the problems that survey data cannot. These datasets are not perfect, and researchers should exercise care before charging headlong into analyses estimating the effect of election laws. However, using these datasets to achieve this goal seems reasonable when paired with the checks we have recommended here and with a commitment to transparency in research practices. With these precautions in place, the next logical step is to ramp up the usage of nationwide voter files as a means of estimating the effect of election laws on turnout.

Bernard Fraga is Assistant Professor of Political Science at Indiana University.

John Holbein is Assistant Professor of Political Science at Brigham Young University.

Christopher Skovron is a postdoctoral Data Science Scholar at Northwestern University.

Topics Election Data and Tools

Back to Main

Search

Using Nationwide Voter Files to Study the Effects of Election Laws

Related Articles

Search

Using Nationwide Voter Files to Study the Effects of Election Laws

Related Articles

New Report: How We Voted in 2024

Kicking Off Our 2025 Election Season