Those statistics are lying to you (maybe)

Darrell Huff first published How to Lie with Statistics in 1954, in which he explained how statistics were employed to intentionally deceive the inattentive consumer of information. This wasn’t a new idea; several of Huff’s predecessors had arrived at the same conclusion:

“There are three kinds of lies: lies, damned lies, and statistics.”
— attributed to British Prime Minister Benjamin Disraeli (1804-1881)

“Facts are stubborn things, but statistics are pliable.”
— Mark Twain (1835-1910)

“[Some individuals] use statistics as a drunk man uses lamp-posts — for support rather than for illumination.”
— Andrew Lang (1844-1912), Scottish writer and anthropologist

Some were even more skeptical of the use of statistics (sorry, statisticians):

“If your experiment needs a statistician, you need a better experiment.”
— Ernest Rutherford (1871-1937), best known for establishing the planetary model of the atom

With that in mind, let’s look at some of the lessons from Huff’s book by picking out forms of misleading statistics:

Biased samples (also called ascertainment bias or systematic bias)

Samples can be biased in various ways, either intentionally or unintentionally, but all “systematically favor some outcomes over others.”

Biased averages

Huff describes the three averages: mean, median, and mode. With a normally distributed (think “bell curve”) sample, these three averages are similar. But get into irregularly distributed samples and they can vary significantly.

Misleading visualizations

By skewing elements of graphs and charts, like the scale, you can paint a picture very different from the underlying truth. Look at these graphs below, and you’ll spot what’s wrong.

gun_deaths_in_Florida_statisticsrecruiting_more_nurses_statisticsMicrosoft_fundraising_statistics

 “Significant findings” and throwing out data

Statistical significance is determined by hypothesis testing and numeric calculation. However, “insignificant findings” are often conflated (perhaps intentionally) with “didn’t support my stance.”

Semi-attached figures

In the words of Huff: “If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing.”

Correlation vs. Causation

Just because things are correlated, doesn’t necessarily mean one causes the other. Look here for some fun examples.

So, how can you avoid falling victim to the statistical charlatans? Start with these questions:

  • Who is making the claim?
  • How do they know/where is the evidence?
  • Are we provided the whole picture?
  • Has someone changed the topic? Are we still discussing the original question?
  • Does the argument make sense?

Now you, too, can fight the good fight against misleading statistics!

 

Top image by Ashley Saunders

 

About the Author

Roberto-Carozza

Roberto Carozza

Roberto Carozza is a data analyst on HWC’s Data & Design team. He graduated from the University of Wisconsin-Madison with a Bachelor of Arts in applied physics and French literature, bridging his love for both the humanities and science. Roberto went on to obtain his master’s in data analytics from the American University, which he uses to dabble in web development, data visualization, virtual reality art and game development, and anything else that captures his eclectic interests.

Leave a Reply

Your email address will not be published. Required fields are marked *