It is worth noting where these attacks come from, with Iran just a little bit behind Russia.You will find more infographics at Statista
“There are three kinds of lies: lies, damned lies, and statistics.” — Mark Twain
But I can paraphrase what supporters of gun rights say: Numbers don’t lie. People lie.
People use numbers to lie.
Sometimes it’s unconscious. I can say, “Most people believe X.” But that word “most” is imprecise, and sounds like I may not have studied the subject enough. So “Nine out of ten people believe X” sounds like I’ve studied the people and counted them.
I want to just post a few notes and recommend a couple of books.
- Many (notice that I haven’t counted them) misrepresentations involving statistics result from not noticing the margin of error when sampling is involved. Words like “surge” or “plummet” are used about polls in newspapers when the changes are within that margin of error. The context here is in understanding the precise nature of the numbers themselves. There’s a difference between “there are five books on my table” and “the average American will have five books on their bedside table.”
- Ask how the numbers were generated. If it was a survey, what was the question? In professional surveys, you can trace the numbers back to the survey. Reliable, ethical researchers show their work. For example, if you ask a number of people how many would kill their own mother for a million dollars, you don’t know how many actually would. You know how many say they would. The context here is the underlying basis for the numbers. It’s the difference between “I would guess that about 5 in 10 people do x”, or “I asked them”, or “I had a hidden camera on them and watched them.” All three methods generate numbers, but the meaning is different.
- Numbers don’t generate predictions on their own. They represent a state of affairs defined by the way they are collected and presented. The context of a numeric prediction can be complex. The prediction is only as valuable as the theory that generates it from the numbers, even if it is represented in numbers.
- Probability is a complex field. I enjoy reading about it. Numbers used to express probability are often difficult to follow and seem unintuitive. For those of us who are not mathematicians, a key point to notice is whether the person involved could actually have the necessary information. If you cannot model in detail an entire process, you don’t know the probability. Let me illustrate from Star Trek–the original series. Spock and Kirk are on Organia, and are unaware of the nature of the Organians. Kirk asks spock to rate their chances, and he gives a number including decimals. This number is irrelevant, because Spock doesn’t actually know what he would need to know.
- In number comparisons, the context of all numbers compared is important, as is their relationship. The data in a comparison is not so much in the numbers themselves but in the theory or theories used to connect them.
In my opinion, most (note how I say this) news articles and popular presentations that involved numbers misrepresent their meaning in some way. In most (note again!) that error is not that relevant to the readers. But in many unfortunate cases it is. If you count headlines, the misrepresentation is worse.
So when reading numbers, look for the source, check the context, understand the theory. The numbers may be correct, but the theory and presentation may make them deceptive.
Herewith a couple of books that are quite readable on this subject.
How to Lie with Statistics. This is an older book, but provides the basics in a readable format.
Lies, Damn Lies, and Statistics. I found this more politically slanted, but the principles expressed are quite good, in my opinion. The bias I noted was in the examples. Note that I did not statistically check my impression of bias!
Statistics: A Spectator Sport. I haven’t read this book, but it’s on my reading list. The description notes that it uses examples largely from education. I suspect that could be valuable as it may be less controversial than using primarily political or media examples.
One of the least accurate elements of the news, in my opinion, is the reporting of opinion polls. If you think this is always someone else, you may be part of the problem.
Polls are not precise measurements and results vary. That’s why you have a probability (often 90% or 95%) that the results fall within a range. Headlines that report a rise or fall in poll results are frequently based on changes that are within that margin or error.
I’ve seen several reports on polls regarding president Trump, as in his approval is falling, or no it’s not. This is not for or against the president. It’s about accurate data.
Got to FiveThirtyEight.com, scroll down the sidebar until you see the graph of the president’s approval ratings. You can get a good deal of information just looking at the graph, but you can also click on the link to get more. This graph represents an aggregate.
In general, I dismiss all headlines. But I definitely do not believe any headline that talks about polls, or generally about numbers.
As a self-professed passionate moderate (the liberal charismatic title was thrust upon me by an opponent), I’m very conscious of bias on both the liberal and conservative sides. To be human is to be biased. I have my moderate biases, including a bias toward considering anything from the left or the right obviously biased. You just can’t win with me!
A number of readers likely already know that FiveThirtyEight.com is one of my favorite, of not my absolute favorite, news source. Besides their efforts to state their own biases, and the fact that I like numbers, this is a result of their efforts to cite their sources and show their work. If I question their rating of a pollster, for example, I can go look at what goes into that rating.
Before I get to the article I’m linking from them today, I want to emphasize something important. I like numbers, yes, but you have to be careful. The reason for this is that you have to understand how the numbers you’re liking were produced. Let me give an example. A friend asked me to read a book on the ancient world because I know the languages and he wanted an assessment of how much credence I should give it. In the book, someone gave measurements for the original size of the great pyramid in millimeters. There is no way the author could actually have that information. Numbers calculated in that way are designed to give the impression of precision even when such precision does not exist.
A more common way to produce a number is to assign it, such as asking people to rate something on a scale from 1 to 10. In order to know the question asked, how it’s asked, and who it’s asked of. After that you might consider asking what those people might know. For example, asking a random sample to rate the quality of cardiac care in this country on a scale from 1 to 10 produces information on how the sample views this, but might tell you as little as nothing regarding the actual state of such care, depending on who is being asked and what they could know.
So here’s the article, Psychologists Looked in the Mirror and Saw a Bunch of Liberals. (You need to read the article—the whole article. This material is useless without the reasoning behind it and the look for solutions.)
Someone noted the bias with a simple show of hands, and followed up with a study looking at the way in which results of studies were presented in journal abstracts. Here’s the generalization:
Sure enough, the abstracts more often explained their findings in terms of conservative ideas rather than liberal ones, and conservatives were described more negatively in the eyes of the raters.
The study authors tested for a bias in their raters and found that their liberal raters actually rated the abstracts as more negative regarding conservative views than did conservative raters. In a separate test, they also note that a panel of psychologists surveyed for their expectation of bias expected the results to be more biased than the study showed they were. You should, in turn, read the note on the potential problem with the panel of psychologists surveyed.
Note to self: Doing a deep enough study on an issue to have a strong opinion is a lot of work and takes a lot of time!
One of the solutions suggested is studies done by “trans-ideogical teams,” i.e., have research done by people who expect different results and who then design a study based on what would change their mind on the topic. I like this idea quite a lot.
I’ll note that this has a great deal to do with the way I publish (my company). I look to create conversation between people of widely differing viewpoints. (This is not identical to creating a church congregation, where some identity is necessary. I also support diverse congregations, but the boundaries will be set up differently.) I believe that in learning, there is great value in hearing the opposing position from someone who actually supports it.
A conservative professor requiring readings from a liberal book and explaining liberal ideas is not as challenging as hearing from an actual liberal. Similarly, if you reverse liberal and conservative. I have lived and learned in situations dominated by conservatives and at other times in ones dominated by liberals. The result I see is the same: Complacency, laziness, and arrogance. One decides one doesn’t have to have support for an idea because “everybody knows that.” But this “everybody” is a very selected subset.
I don’t see any solution here except intentionally involving people who disagree. I have found for myself that I cannot truly express the support for an idea I don’t accept myself nearly as well as a person who truly does support it, even if I try diligently.
This article is encouraging to me because it attacks bias in two ways: 1) Identifying and quantifying it, and 2) Looking at ways to correct for it.
Here’s your illustration. Liberals loved Nate Silver because he calculated that Barack Obama would win the presidency, among other things. Conservatives didn’t like him so much. Now conservatives are pointing to the poor odds, though 60-40 is a ratio many politicians would covet.
I love Nate Silver not because of who he supports but because he shows his work, admits his mistakes, and has a pretty good track record. If I want to disagree with him, I can find the data in his own material. No, I don’t think he’s always right. The good thing is that he doesn’t think he’s always right.
People on both sides of the political spectrum try to make polls say what they want, or they cherry pick the poll that suits them. Newspapers tend to represent polls in whatever way will sell the most papers. It causes me to remember the book Lies, Damn Lies, and Statistics. There’s no better way to lie to people than to combine two factors: 1) Tell them what they want to hear and 2) Put some numbers in it.
We like meaning and connections, and we’ll sometimes find them even when they’re not there. People who understand this can deceive you. The Improbability Principle from Neuroblogica is a very good summary of this.
… always consider the sampling error when you report the difference between successive polls.
News organizations have been getting some better, in my subjective view, in noting when a result is within the sampling error in a particular poll, but they still report increases or decreases in a lead without that note. If a candidate moves from 46% to 48% in successive polls where the margin is +/-4%, that is not a statistically significant change. If multiple polls show results that are all within their various sampling errors, the polls are not scattered all over the map or giving a different story.
I also wish new stories would define the various terms they use to modify “lead” or “trail.” One has no idea from the headline just what has happened.
OK, that’s my whining for the moment. 🙂
. . . or how to lie with headlines.
I get very annoyed with the reporting of polls. One way to create news is to incorrectly headline or even incorrectly describe polling data.
For example, CNN uses the headline Poll: Romney & Gingrich Tied for Top Spot in reporting on the latest USA Today/Gallup poll regarding the Republican presidential race. In the text they explain that Romney and Gingrich are at 20% and 19% respectively, that this is well within the margin of error of the poll (+/- four percentage points), and is thus essentially a tie. The number part of this is essentially correct.
Then they say that Cain is following close behind, but they don’t point out at this point that Cain’s 16% is also within the sampling error of both of the leading candidates, or rather, that the probable range of Cain’s percentages largely overlaps those of the leading two candidates.
The margin of error provides a range within which the real percentage of the whole population is likely to fall. If you go to the Gallup site for this poll you’ll find that the confidence level is 95%, in other words, there is a 95% probability that each candidates percentage of the real population falls within +/- 4 percentage points of the poll’s result. Thus there is a 95% change that Cain’s percentage is between 12% & 20%, that Gingrich’s is between 15% & 23%, and that Romney’s is between 16% & 24%.
If you’re wondering why the polls seem to swing quite a lot among the leaders, this would be your explanation. If in a future poll, the number varies by less than four percentage points, that number would not necessarily reflect any change in that particular candidate’s support.
Essentially, the news writers can produce the story they want. It’s possible (though with multiple polls showing him dropping, it’s not likely) that Cain could still be leading this.
Now this particular headline may seem minor. But if you examine the headlines after just about any poll you’ll find that different news services spin the results differently, and that by reading the headlines and the first few paragraphs, you’ll get a somewhat different picture than you would if you read to the end of the story, or even better, go to the source of the poll.
In this story, while we are told that the difference between Romney and Gingrich falls “well within the poll’s sampling error” in the second paragraph, we don’t find the actual margin of error (+/- four percentage points) until the very last sentence. At that point, if we look back, we can see that Cain is also within that margin of error, or rather that the intervals of all three top candidates overlap considerably.
(For a write-up on this, see the Wikipedia article Margin of Error.)
Recently I’ve had several different things remind me of what I perceive to be a serious problem with numbers in this country. This can have a severe impact on one’s personal life, but also on church and social policy issues.
I recall when I argued some academic affairs committee into allowing me to count a probability and statistics course against a math-science requirement, even though it wasn’t on the list, and I have always been glad that I took that particular course as part of my own limited math education. Of the courses I took outside my own major, that one is easily the one that has contributed to my daily life.
Now the one course, and whatever reading I’ve done on the subject since, does not make me an expert. But you don’t need to be an expert to detect problems with the way people use numbers. You just need to know some basics, and then ask questions. Some of the questions don’t even require math. For example, if you read a newspaper story about sexual promiscuity that indicates that a certain percentage of teenagers are sexually active, you need to ask just how they know that. The answer can be found, though to get very specific you might have to go find the report. Reporters rarely give any of the methodology. (A second course in survey design, only partially completed though I read all the texts anyhow, helps me here.) In the survey you want to look at the questions asked to see just what the definitions are. Normally you will find that those who conducted the survey used good methodology and reported the facts in the appropriate context. It’s when the survey gets quoted that the problem starts.
Here are some of the interesting cases I’ve noticed. The Florida Lottery is advertising a new drawing. According to them, this gives you additional chances to win. Now this is one of those lines that can qualify as true, but only if you assume people will understand it in a certain way. The way consumers, especially those addicted to the medium, actually hear this is that they have a greater chance of winning. Unless you increase the number of winners while selling the same number of tickets, the probability of an individual winning does not increase. Similarly, a few years back the lottery advertised better chances of winning by placing five scratch-off patches on each ticket rather than just one. A moment’s thought will tell you that the probability of winning remains the same, since every ticket now provides five chances–every ticket.
Pepsi’s current commercial talks about the number of chances you can get. Here it’s more benign because you’re merely buying Pepsi products, which I presume you were going to buy anyhow (I could be wrong!) yet they work with the “billion” chances. If I give out a billion tickets to allow someone to win $10, but I only have one winner, then I have, truthfully, given out a billion chances. Of course Pepsi has many prizes, but the principle is the same. Here they are merely impressing everyone with large–and irrelevant numbers. The real number that should interest you is the probability of winning any prize or of winning a particular prize, a number that will be quite depressing.
Then there are the polls. Reporters have gotten much better at pointing out the margin of error, though they seem to miss the decimal portion of it frequently. A 3.7 margin of error is closer to four than to three, and I’ve seen a couple of cases where two candidates were actually within the margin of error but were reported as outside of it. Then people regularly miss (and are not told) the percentage chance that the result is outside of the margin of error. What I’ve noticed more in the last few days, however, is that reporters will note a trend when the difference between the previous figure for a candidate and the current one is still within the margin of error. I would point out, as well, that the margin of error is not a line inscribed in steel, in other words it doesn’t switch from “certain to be correct” to “certain to be incorrect” on the dot.
Then there is the division of demographic groups. I’m not really talking statistical measures here, but rather our need to divide and classify things. I don’t even object to this division. It’s necessary to analysis. But it’s useful to remember in thinking about these groups that people’s attitudes don’t undergo a radical shift on the line between 25 and 26, or at the point where they begin to make $50,001 annually. People are pretty analog. Analysis tends to be binary.
I want to mention one last church related issue. I remember a conversation with a pastor who informed me that most (I forget the particular number, but I think the percentage was in the 50s) people who were looking for a church in our neighborhood were looking for a traditional worship experience. The immediate assumption was that the road to church growth was by providing such a service and focusing on it. Now that might be true. But don’t forget the 40+ percent. Before those numbers have good context to provide a basis for decision making, we need to know how many churches are providing a traditional service and how many are providing something more free-flowing with contemporary music, amongst many other factors. Many business operate with the purpose of providing services to the minority in a community, those with specialized wants and needs.
For whatever reasons we place greater weight on an argument that has numbers in it. When I went to the emergency room a couple months back with abdominal pain, the nurse wanted me to rate it from 1 to 10. Now the fact is that I have experienced remarkably little pain in my life. How do I come up with a number? Painful enough to get me to the ER, but what number to assign? Once we have a number for the record, however, we feel that we have more accurate information. Those numbers, however, are only as good as the data collection method that produced them.
Statistical information could be extremely valuable, but it is also subject to abuse. That’s not because of an inherent weakness in the method, but because so few people take the time to take the numbers apart and understand what they’re saying. Thus the unscrupulous, or just the numerically challenged, can deceive us too easily.
(For those without math training, let me recommend a couple of books: How to Lie with Statistics, which is old but fun, and Lies, Damned Lies, and Statistics. I have seen some reviews that accuse the latter book of a conservative bias, and it may have one based don the selection of stories, but I think he does well in analyzing the data for each case he does cite.)