Past vote data outperformed the polls. How did it go so wrong? | Sol Messing

Polling versus past votes

Perhaps what surprised me the most about polling this time around was when I went to evaluate some election projections I put together in April that we used internally at Acronym to help evaluate where we might want to spend. I pulled in the NYTimes polling averages and compared them with the latest state-level presidential results from the AP. I then did the same for the April projections. Turns out the projections were significantly more accurate than the polling averages:

What went wrong: The Usual Suspects

Humble-brag aside, it’s worth asking what might have gone wrong with polling in 2020?

Other Potential Factors

Likely voter models: This is difficult to fully unpack since each polling house does this slightly differently and not all publish their methods — some ask a battery of voter questions, some use models, some recruit off the voter file. But there’s only a weak relationship between who votes and who scores high on the likely voter battery. To make matters worse, 2020 was a very high-turnout election, which could have introduced even more instability into likely voter models.

The Role of Election Forecasts

If you’re a forecaster, it’s very easy to look at all the polling data and come away with overconfident estimates of a candidate’s support. Many forecasters in 2016 did just that, failing to account for the fact that error between states and pollsters were likely correlated, and producing estimates that put Clinton’s chances above 95%.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store