Improving Voter List Maintenance: Fast-ER!

R. Michael Alvarez

We are very excited to announce the unveiling of a new and very fast GPU implemention of Fellegi-Sunter probabilistic record linkage methods, which promise to significantly speed up many of the quantitative approaches used for voter registration database maintenance. This new package was engineered by Jacob Morrier (a PhD student at Caltech) with the assistance of Sulekha Kishore (a Caltech senior).

The new package is called Fast-ER, implemented in python and available as open-access sofware.

Why is this package useful for voter registration list maintenance?

Turning back the clock two decades, one of the important changes that was introduced into the administration of elections in the U.S. was the requirement in the Help America Vote Act for every state to implement a computerized statewide voter registry. The development of these computerized statewide voter registries arose from concerns about problems voters reported about registration issues — and the need to improve how voter registries were updated and maintained.

Fast-forward two decades, and there’s been quite a bit of progress in the development and implementation of methods for detecting various types of potential errors in voter registration databases. One of those approaches is based on work that we have been doing with various state and local election officials, with the methodologies summarized in two papers. One of these papers was published in 2019, which presents an approach for detecting anomalies in records (like duplications or record changes) at the county-level (published version, preprint, GitHub). The second paper presents an approach for detecting anomalies in a larger jurisdiction, like a state where there are significant differences across local units like counties (published version, preprint, GitHub). These approaches are both based on probabilistic record linkage methods that are common in the literature, in particular the very useful FastLink package.

Fast-ER promises to speed up the type of probabilistic record linkage that is done to check for potential duplicate records, for anomalous record changes, and other potential inaccuracies in voter registration data dramatically. In some experiments that Jacob and Sulekha have done using North Carolina voter records, in larger datasets Fast-ER runs up to 60 times more quickly than other methods.

In the past, when our group was collaborating with some states to use probabilistic record linkage to study the integrity of their voter registries, larger state files could take almost a day for us to process, using a pretty sophisticated cloud computing instance with a lot of memory. Now, with Fast-ER, we are optimistic that we can turn around reporting to election officials in a matter of hours, rather than days. Also, this new method will allow the faster evaluation of projects that might involve multiple states, which is something that we hope to conduct in the future.

Next
Next

The Disinformation Playbook