“There are three kinds of lies: lies, damned lies, and statistics.” — Mark Twain
But I can paraphrase what supporters of gun rights say: Numbers don’t lie. People lie.
People use numbers to lie.
Sometimes it’s unconscious. I can say, “Most people believe X.” But that word “most” is imprecise, and sounds like I may not have studied the subject enough. So “Nine out of ten people believe X” sounds like I’ve studied the people and counted them.
I want to just post a few notes and recommend a couple of books.
- Many (notice that I haven’t counted them) misrepresentations involving statistics result from not noticing the margin of error when sampling is involved. Words like “surge” or “plummet” are used about polls in newspapers when the changes are within that margin of error. The context here is in understanding the precise nature of the numbers themselves. There’s a difference between “there are five books on my table” and “the average American will have five books on their bedside table.”
- Ask how the numbers were generated. If it was a survey, what was the question? In professional surveys, you can trace the numbers back to the survey. Reliable, ethical researchers show their work. For example, if you ask a number of people how many would kill their own mother for a million dollars, you don’t know how many actually would. You know how many say they would. The context here is the underlying basis for the numbers. It’s the difference between “I would guess that about 5 in 10 people do x”, or “I asked them”, or “I had a hidden camera on them and watched them.” All three methods generate numbers, but the meaning is different.
- Numbers don’t generate predictions on their own. They represent a state of affairs defined by the way they are collected and presented. The context of a numeric prediction can be complex. The prediction is only as valuable as the theory that generates it from the numbers, even if it is represented in numbers.
- Probability is a complex field. I enjoy reading about it. Numbers used to express probability are often difficult to follow and seem unintuitive. For those of us who are not mathematicians, a key point to notice is whether the person involved could actually have the necessary information. If you cannot model in detail an entire process, you don’t know the probability. Let me illustrate from Star Trek–the original series. Spock and Kirk are on Organia, and are unaware of the nature of the Organians. Kirk asks spock to rate their chances, and he gives a number including decimals. This number is irrelevant, because Spock doesn’t actually know what he would need to know.
- In number comparisons, the context of all numbers compared is important, as is their relationship. The data in a comparison is not so much in the numbers themselves but in the theory or theories used to connect them.
In my opinion, most (note how I say this) news articles and popular presentations that involved numbers misrepresent their meaning in some way. In most (note again!) that error is not that relevant to the readers. But in many unfortunate cases it is. If you count headlines, the misrepresentation is worse.
So when reading numbers, look for the source, check the context, understand the theory. The numbers may be correct, but the theory and presentation may make them deceptive.
Herewith a couple of books that are quite readable on this subject.
How to Lie with Statistics. This is an older book, but provides the basics in a readable format.
Lies, Damn Lies, and Statistics. I found this more politically slanted, but the principles expressed are quite good, in my opinion. The bias I noted was in the examples. Note that I did not statistically check my impression of bias!
Statistics: A Spectator Sport. I haven’t read this book, but it’s on my reading list. The description notes that it uses examples largely from education. I suspect that could be valuable as it may be less controversial than using primarily political or media examples.