Search

Looking at Precinct-Level Election Data

A look at ensuring available and accessible data

Those of you who follow election data updates closely might have noticed that on Tuesday, we released another major update to the MIT Election Lab’s precinct-level 2016 returns.

This brings us closer to publishing complete election results, up and down the ballot, from every precinct in the country in 2016. Tuesday’s release extended coverage by 11 states, to 36 states and the District of Columbia. Geographic coverage will be complete with returns from 14 more states.

You can find the latest release of precinct-level data on the MEDSL Dataverse, organized into presidentialU.S. SenateU.S. Housestate, and local datasets. (If you work in R, our elections package provides these datasets in an easily accessible format too.)

The availability of returns at the precinct or even county level has been poor historically. To overcome challenges in data collection, the Lab is relying on the work of our undergraduate contributors; our colleagues at the University of Florida’s United States Election Project; and another open data effort, the Open Elections project, whose volunteers have been able to parse some particularly difficult returns. We’re hopeful that what we’ve learned and built working with the data from 2016 will help us move quickly through the 2018 returns after this year’s midterm elections.

After our initial data collection, our focus turns to normalization and validation. We resolve internal inconsistencies so that, for example, the names of candidates and parties appear throughout the data in the same way. Doing so makes analysis more predictable and less error-prone. During validation, we test that the data meet our expectations: aggregated precinct votes should match those from state-level returns, for instance, and known jurisdictions, candidates, parties, etc., should appear in the data where anticipated. We haven’t defined a formal specification for the datasets, but our pre-release checks enforce a growing number of constraints. If you ever discover an issue when using our data, please send us an email!

It’s important to us to lower the barriers to using these datasets in analysis. So we’ve added candidate identifiers to make merging the precinct returns with third-party data easier: FEC IDs for federal candidates, and (from the @unitedstates project,) GovTrack, ICPSR, MapLight, Open Secrets, WikiData, and Google Knowledge Graph IDs for federal incumbents and winners.

We’ve also added county-level geographic variables to the data via the 2016 Election Administration and Voting Survey (EAVS) and the Census Bureau’s 2017 gazetteer files. (Where local governments administer elections, these variables describe the county that contains the jurisdiction, but Census place identifiers are a possible extension.)

If you’re working with precinct returns or interested in the possibility, let us know what would be useful to you!

James Dunham is an MIT Election Lab research assistant and a Ph.D. candidate in political science at the Massachusetts Institute of Technology, specializing in how policymaking in the United States responds to public opinion and interest-group activity.

More
Topics Election Data and Tools

Back to Main

Related Articles