How The Washington Post creates its presidential polling average

If you want to know who is ahead in the presidential election, there are plenty of places to look these days.

Many media organizations have their own versions of presidential polling averages; unlike us, some even produce election forecasts, which take into account the state of the polls plus other relevant variables (like economic data) and then project who will be the likely victor.

At The Washington Post, we have our own polling average. It currently shows Vice President Kamala Harris with a two percentage point lead in the national popular vote. It also shows her ahead in Pennsylvania, Michigan and Wisconsin — which, if the election happened today and the polls were spot on, would give her the barest electoral college majority. In our polling average, Donald Trump currently leads in Arizona and Georgia, while North Carolina and Nevada are virtually tied.

We understand there is a lot of data out there and it can be hard to differentiate between the different models and their methodologies. So we thought it might be useful to highlight and explain our approach.

It’s in the polls

The largest difference between our averages and others is that we only include the highest quality state and national polls. As we describe in our methodology: “We only include polls that are transparent about how they are conducted … and use methods that have demonstrated accuracy.” We also don’t include any partisan polls or polls that are released by candidates or campaigns.

Our polling team has evaluated different survey methods, and we only include those pollsters that apply the best ones. Things we consider include sampling size and respondent contacting methods, such as whether someone is only reached via the internet or a landline instead of via a cellphone. No matter how voters are contacted, the sample that pollsters end up with can still overrepresent certain segments of the electorate. We look at how pollsters adjust to account for the mismatch between their polling sample and the voting population and correct for the difference.

While it doesn’t seem obvious, there are actually pros and cons to this approach. In 2020, the highest quality polls actually did slightly worse at predicting the election outcome than lower quality surveys. But on average (and over multiple elections), higher quality polls are generally more reliable and accurate.

We don’t overreact

Another distinguishing feature of our model is that it’s cautious. By that we mean that our model is built to be slower to change with the arrival of new polls.

There are two reasons for that.

First, our model is designed not to overreact to tiny changes in the polls that may not be explained by actual shifts in the race. Smaller polling movements could be explained, for instance, by partisan nonresponse, which means one candidate’s supporters are less likely to respond to the pollster. Second, because we only include the highest quality polls, we end up including fewer polls than some other models. Thus, our average might take a bit longer to reflect changes. This provides assurance that the changes that are reflected in our model reflect real shifts in public opinion.

If you follow our model closely, you might have noticed that in the week after the Sept. 10 debate between Harris and Trump, a few polls came out showing Harris with increased support.

But our model barely changed. That was partially because the polls were taken by pollsters that already showed better than average results for Harris in this election cycle relative to other pollsters (this observation does not imply that the pollster is biased, as such a trend can arise from different models of the electorate that are reasonable) so our model discounts gains by Harris reported by those polls a bit. But it’s also because while Harris improved post-debate in those polls, the changes were in the realm of what our model would expect to see by random chance even if there had been no change in support at all.

And a few days later, we got additional high-quality polls that showed minor to no improvement for Harris after the debate, showing why our slow-and-steady approach is more useful.

That’s not to say our model doesn’t adapt at all. You can see how much our polling average shifted after Harris took over the Democratic ticket — she improved on Biden’s position by more than four percentage points.

It’s just that we built our model to require significant evidence before accepting that changes in polls are real. In the case of the debate, we found that our model would have needed to see another two or three Harris-favoring polls before it would have inferred that there was a real post-debate bump in her support.

State and national

Another feature of our model worth highlighting is that estimates in each state adjust to new polling, even when that data is from national or out-of-state races.

This is important because there are many more national polls than state-level polls, and even less that we end up using because we are so picky about which polls we include. This means that in some states, we can go for long periods of time without any state-specific polls. Our national polls (and the polls from other similar states) help fill out the picture of what might be happening in key swing states that may be demographically similar.

A quick way to think about this is Nevada, where lately there have been fewer high-quality polls. But instead of flying blind in Nevada, our model relies on changes in the national environment (estimated by national polls) as well as polls from demographically similar states like Arizona, to give us an estimate for what might be happening in the Silver State. And since our national polls have been nearly constant for a few weeks, so has our average in Nevada.

This feature was especially beneficial right after Biden dropped out. For a few weeks, we were only getting high-quality national polls and no new state ones. But our approach allowed us to infer what was likely to be happening in the swing states, which is much more important for understanding who is likely to win the presidential election.

It’s not a forecast

Finally, an important thing to keep in mind about our polling average is that it’s not a forecast. Instead, it’s our best estimate of the current state of public opinion. The difference might seem minimal, but we believe it’s the better way of understanding elections.

Polls are a snapshot in time, but many things can happen between today and Election Day that a poll cannot account for. Other methods try to model the range of possible changes, but there are a lot of assumptions necessary to turn a polling average into a predictive forecast. These assumptions are often opaque and unverifiable. So we’ve decided to stay away from that, and instead display our best estimate for the current state of the presidential election.

In sum, we believe our approach is the best way of understanding the current state of the race. We include the highest quality and most trustworthy polls; we’ve designed our model to update carefully and deliberately; and we’ve made sure that we can always show the best and most up-to-date data across different states and the national environment.

This post appeared first on washingtonpost.com