Those statistics are lying to you (maybe): The election edition

Red-White-Blue Stars & Stripes

With the midterm elections rapidly approaching, voters are being inundated with a flurry of political statistics. A prospective voter referencing news sources from both sides of the aisle can easily find differing–even outright opposing–forecasts for election outcomes. How is this possible?

Polls drive forecasting

The only way to truly be able to predict the outcome of an election is to know for whom voters will cast their ballots–and the only way to do that is to ask the voters. However, asking every registered voter in the United States is a Herculean (or maybe more of a Sisyphean) task that would require ample resources and time. To circumvent this issue, various institutions conduct polls, in which they ask samples of the population a series of questions about themselves and how they plan to vote. As of May 30th, 2018, FiveThirtyEight identified 396 pollsters in the United States, and ranked them based on their statistical methodology and historical accuracy. And here, already, is where things start getting messy. Each of those pollsters conducts their polling differently; they ask different groups of people different questions in different ways, all which result in different forecasted outcomes for an election.

Different groups of people = Sampling

A key factor contributing to the variation in polls is sampling, or how a subset of a population is selected to estimate behaviors of the whole. A truly random sample, one where all subsets have an equal probability of selection, means that the most likely issue you run into is a simple statistical error. Simply stated, statistical error is the varying degrees of certainty you can have in your results based on how many voters you polled. More voters polled means a greater statistical certainty.

However, achieving a truly random sample isn’t easy, and failing to do so leads to systematic errors, or fundamental problems with how your data was collected. This type of error doesn’t improve by increasing the number of voters polled, and egregious systematic errors can invalidate entire studies.

Different questions in different ways = Data collection

If sampling is a pillar of a statistically sound survey, then the questions you ask of your sample, and how you ask them, are at least equally important. Voters’ choice of candidate ultimately drives political polling, but collection of additional information, like location, socio-economic, or education level data, facilitates significantly more in-depth analysis. If this additional data isn’t collected in a methodologically rigorous way, you risk collecting biased data, or data that is skewed toward a particular outcome.

For example, older telephone polling used a random dialing approach to poll future voters, where landlines were called by machines. Response rates eventually fell to the point where only about 10% of calls resulted in an interview with the voter, resulting in a smaller sample. The question then became whether or not the individuals agreeing to be interviewed differed fundamentally from the sample pool. Did they tend to be older, or belong to a particular political party? Had they completed a higher level of education, on average, than other voters in the sample? If such a discovery was made, this would be an example of nonresponse bias, where respondents tend to differ from those who did not respond.

Let’s now assume that you’ve selected a perfectly representative sample; there are no inherent biases in the method of selection or the resulting constituents of your voter group to be interviewed. You’ve also identified, conceptually, the perfect set of questions to ask for insightful analysis and forecasting. What could go wrong? Here you could face another set of challenges that stem from research bias, or the fact that you, the pollster, may not be asking fair questions. This bias can arise from intentionally seeking certain answers through leading or carefully worded questions. It can also stem from unintentional factors, like the order of your questions, or the fact that respondents may interpret your questions differently due to their cultural background.

It all comes down to you

The examples of bias above show a few of the many ways in which a study or poll can generate misleading conclusions. Whether these biases were introduced intentionally or through error, it is up to the consumer of the information to make sure that they protect themselves from statistical shenanigans. We live in an age of unprecedented access to information–let’s make the most of it!

 

Graphic by Christina Wagner

About the Author

Roberto-Carozza

Roberto Carozza

Roberto Carozza is a data analyst on HWC’s Data & Design team. He graduated from the University of Wisconsin-Madison with a Bachelor of Arts in applied physics and French literature, bridging his love for both the humanities and science. Roberto went on to obtain his master’s in data analytics from the American University, which he uses to dabble in web development, data visualization, virtual reality art and game development, and anything else that captures his eclectic interests.

Leave a Reply

Your email address will not be published. Required fields are marked *