First-of-its-kind database now available for election research
Announcing a standardized, multi-state database of cast vote records
For the first time, those studying elections in the United States can access a standardized, multi-state database of cast vote records from the 2020 general election.
Ballots—the core records of an election—are rarely made available on an individual level. Now, researchers from MIT’s Election Data + Science Lab, Yale University, the University of California, Los Angeles, Columbia University, and Harvard University have announced a new database of cast vote records from the 2020 U.S. general election. The project, which is in press at a peer-reviewed journal, was compiled from anonymous digital records of actual ballots cast by 42.7 million voters. The resulting dataset captures data from 20 states and includes vote records for more than 2,204 candidates at all levels, from national races to local seats.
This database serves as a uniquely granular administrative dataset for studying voting behavior and election administration, and is now available on the Dataverse for use by other researchers. In recognition of their work to standardize and publish these records, the team behind the project received an MIT Prize for Open Data, which was established to highlight the value of open data at MIT and to encourage the next generation of researchers.
While most election watchers focus on candidates’ vote totals (which are publicly certified and regularly reported by states), researchers and litigators often seek out records at the individual ballot level. “Cast vote records,” or CVRs, offer an anonymized electronic record of actual ballots cast and are indispensable in many election-related analyses. However, these records have historically been difficult to find, inconsistently formatted, and have not been independently evaluated for accuracy.
What may seem at first glance to be a niche annoyance to academics took on greater significance following the 2020 election, when officials in election offices around the country began receiving a flood of requests for CVRs following unfounded accusations of voting irregularities. Academic evaluations based on accurate vote records can play a key role in assessing and alleviating concerns about voter fraud and election security.
The researchers behind this project believe their work will be useful to academics who study electoral behavior, as well as those in election law and administration who study the integrity of the electoral process itself. Because CVRs capture data at the individual level and record anonymized voters’ actual choices across a full ballot–from the presidential race to neighborhood-level offices–they allow researchers to investigate important aspects of voting behavior far more accurately than other methods. While CVRs cannot be used to directly check results like an audit, the ballot-level data they provide can be used to investigate and explain unanticipated election results, potentially allaying voters’ mistrust of the vote counting process.
The team also hopes their work can serve as a standard for future work as election officials and academics work together toward a common understanding of election technology and its use in upholding the legitimacy of elections.
“Election administrators across the country have been grappling with tradeoffs under intense scrutiny–whether to release or restrict access to data like cast vote records and how to balance such transparency with their other critical responsibilities,” said Shiro Kuriwaki, a Faculty Fellow at Yale University’s Institution for Social and Policy Studies, who co-led the project. “Our independent examination and release of this dataset, I hope, shows a way forward: cast vote records can be transparently made available without being politicized.”
“Cast Vote Records: A Database of Ballots from the 2020 U.S. Election” can be accessed at https://doi.org/10.7910/DVN/PQQ3KV. The study was led by Kuriwaki and Mason Reese, a Ph.D. student at MIT and graduate researcher with the MIT Election Data + Science Lab (MEDSL). Co-authors on the study included Jeffrey Lewis of UCLA; Taran Samarth of Yale; Samuel Baltz, Joseph Loffredo, Kevin Acevedo Jetter, Zachary Djanogly Garai, Kate Murray, and Charles Stewart III of MIT and MEDSL; Aleksandra Conevska, Can Mutlu, and James Snyder Jr. of Harvard; and Shigeo Hirano of Columbia.