We use a machine learning model based on the transformer architecture to replicate and expand the Comparative Agenda Project’s coverage of American legislatures. Our model is jointly trained on pre-coded Congressional and Pennsylvania legislation and it compares favorably to extant supervised machine learning models. Using Pennsylvania as a keystone allows us to bridge the national and state legislative contexts, and produce 1.687 million estimates of the leading policy in legislative documents from Congress and the 50 state legislatures since about 2009. Validations show the model agrees with human-coders on the vast majority of policy assignments, and the disagreements are based more on inconsistencies in the codebook’s logic than random error. We discuss the challenges with applying a model like this to the study of legislative institutions.
Companion
Journal of Political Institutions and Political Economy, Volume 6, Issue 3-4 Special Issue: Artificial Intelligence and the Study of Political Institutions
See the other articles that are part of this special issue.