by Dr. Fred Worth, Department of Mathematics, Henderson
State University, posted May 29, 1996. Dr. Worth notes the much
of the material for this paper came from the book "How to
Lie with Statistics" and Reader's Digest.
The world today almost seems to run on statistics, surveys and polls. Especially considering that this is an election year, it is essential that people understand what statistics say and do not say and when they are or are not reliable. [note: Much of the data contained in this paper comes from the book "How to Lie with Statistics." The book was published over 40 years ago so some of the data is outdated.] Statistics can cause great deal of worry. We have all heard "the average child walks/crawls/talks/etc. by the age of ... ." A mother hears that and thinks, "Oh no, my child doesn't walk yet, what's wrong with him?" Very likely, there is nothing wrong with the child. My son didn't walk until he was about 19 months. He didn't crawl until around the same time. Was there something wrong with him? No, he was just very heavy and needed to develop more strength than most kids do. Variation is important to consider. IQ testing is another area that can cause a great deal of stress, especially for parents. Suppose you want to measure off a football field. Your stride is about 1 yard long. So you step off 100 yards. How accurate will it be? Within 3 feet one way or another? Long by more than 3 feet? Short by more than 3 feet? Reasonable guesses (where P(A) denotes the probability that event A happens) would be P(97103). IQ tests (by themselves) are likely no more accurate than that as a measure of intelligence. All kinds of other factors may enter in. Background, health on the day of the test, what you read just before you took the test, fatigue, etc. Suppose the statistics ARE accurate. What do they prove? Cigarette brand X has 50% less tar/nicotine/carbon monoxide/etc. than the other leading brands. Does that make them healthy? More doctors smoke brand X than any other brand. That may very well be true. Does that mean that smoking brand X is a healthy choice? A mechanical juicer was advertised as being able to extract "26% more juice." That sounds pretty good until one realizes that the ad did not say "26% more juice" than what. It turns out it was 26% more juice than a hand juicer. It was not being compared to a comparable product. More car accidents occur in clear weather than foggy weather. We therefore conclude that it is safer to drive in the fog than in clear weather, right? Of course not. Is the statistic false? No. How meaningful is the statistic. Not very. Consider that, in most places, clear weather is much more common than foggy weather. Consider also that when it is foggy, many people will avoid driving. Thus, it is to be expected that more accidents will occur during clear weather than foggy weather. A more useful statistic to consider is NOT number of accidents but number of accidents per mile driven. It was reported that the death rate in the Navy during the Spanish-American war was 9 per 1000. The death rate in New York City during the Spanish-American war was 16/1000. Are we then to conclude, that during the late 19th century, it was safer to be in the Navy during a war than it was to live in New York City? That is simply absurd. The two statistics do not compare the same kinds of populations. The Navy would have young, healthy adults. New York would have a far more diverse population. It would include infants, the elderly, the ill. All of those populations have much higher mortality rates than young, healthy adults. Arkansas ranks 49th in ... . We hear statistics like that all of the time. The intended reaction is "Oh, no. We have to improve" whatever it is that was being measured. Consider, however, that no matter what is being measured, if you have 50 states being ranked, one of them must be 49th. That of itself does not mean that any of the states are deficient. "Joe Smith is the paid major league baseball player." Does that mean Joe is ready for the poverty line? Consider that all major league players make more than $100,000 per year. That hardly qualifies them for public assistance. A study once showed that students who smoke have lower grades. Some people concluded from that data that smoking causes lower grades. Such a conclusion is not valid without considerably more evidence. The problem here is distinguishing between causation and correlation. Simply having two events happen together does NOT mean either one necessarily caused the other. There are more weddings in June than any other month. There are more suicides in June than any other month. Therefore, we conclude that weddings cause suicide. Again, we have an absurd conclusion. How many of the suicides were people who recently married or knew someone who recently married? Is it possible that other factors could better explain the data? It would be just as silly to say that since June has more weddings and that June has only 30 days, that all of the weddings cause June to have fewer days. Cornell did a study (approx. 1950) of middle age alumni. It showed that 93% of male graduates were married and 65% of female graduates were married. Female college graduates were three times as likely to be unmarried as female non-college graduates. Therefore, some people concluded that graduating from college makes it less likely a woman will marry. Such a conclusion ignores vital information. Isn't it possible that those women who married during their college careers were more likely to drop out to pursue family goals rather than educational or career goals? Another vital conern is how reliable the data are. Suppose I ask 100 people how much they slept last night. I add up their answers and get 783.1 hours. I report the average person slept 7.831 hours. Sounds like a precise study. It is worthless. Do you really know how long you slept last night? Around 1949, Yale took a survey of their graduates. The survery showed that the "average" salary of Yale grads was $25,111 per year (a princely sum in those days). What does this mean? Does it prove that Yale graduates make more money than those of other colleges, or more than people in general? Actually, no, it doesn't. Several questions must be considered. First, what "average" was used? There are three frequently used statistics that are often called averages. First is the mean. It is what most people think of when they think of an average. The mean is obtained by adding up the numbers and dividing by the number in the sample. For example the mean of 5, 6, 6, 7 and 11 is (5 + 6 + 6 + 7 + 11)/5 or 7. Second, is the median. The median measures the middle value. The median of 5, 6, 6, 7 and 11 is 6. Lastly, the mode denotes the value that occurs most frequently. The mode of 5, 6, 6, 7 and 11 is 6. Each of these "averages" are useful but all can be deceptive. Back to the Yale salary study. If "average" implied the mean, consider the following possibility. Suppose 100 graduates answered the survey, one made $2,000,000 and one made $500,000. Then the other 98 "average" $113 per year. If the mode was used, it is possible that 2 graduates made $25,111 while all the others made under $1000 per year. In general, the median is a better measure for populations with possibly wide variations, like salary. But even if the median is used, is this useful information? How was the study done? It was done by mail and depended on the respondent giving his/her salary. This creates several problems. Are people honest? They may lie, giving a higher salary (to look good) or giving a lower salary (lied to IRS so they want to stay consistent). They may not know how much they make so they may have guessed. It is highly probable that they did not get all of the living graduates to respond since some would have whereabouts unknown (likely unsuccessful) or others would not want to respond (likely unsuccessful). These various problems, unless addressed somehow, make the study relatively useless. Suppose I want to do a study of the average height of people who play miniature golf at a particular miniature golf course. I do the mean and get a mean height of 4'8" and a median height of 4'8". I thus conclude that most of the people there that day are around 4'8". Good conclusion? Well, it turns out that it was father-son day at the local day care. So I found 24 fathers with their 3 year old kids and one 10 year old brother. Variance also is a valuable tool when doing statistics but then it reporting results starts getting so complicated that the general public (and the media reporting the statistics) is not going to understand whether the information is valid or not. It is sort of like the guy who had his head in a hot oven and his feet in liquid nitrogen who said "on the average I feel fine." A classic example of a bad use of statistics occured during the presidential election of 1936. Literary Digest magazine did a presidential election poll in 1932. Their results were very accurate. So they did it again in 1936 using the same methodology. Their results showed Alf Landon would win over Franklin Roosevelt by an electoral vote margin of 370 - 161. The actual results were a little different. Roosevelt won 523 - 8. What happened? They did their survey via a telephone poll of their subscribers. Why would that be a problem? What happened in 1929? The stock market crashed and the depression began. By 1932, things were getting bad but not too bad. By 1936, things were getting really bad. People gave up "unnecessary" things like magazine subscriptions and phones. Who still had magazine subscriptions and phones? Rich people. In the 1930s rich people tended to be Republicans so it was natural that a majority of those polled would vote for the Republican candidate. Another problem to consider is emotion. Whenever you deal with emotionally charged issues you need to be especially careful. Abortion is a prime example. Both pro-abortion groups and pro-life groups can show polls that show that the majority of the public supports their view. How do we word those questions? If I want to show support for a pro-abortion view, I would ask "Should women be allowed to make their own medical decisions without the intrusion of an overbearing federal bureaucracy?" If I want to show support for a pro-life view, I would ask "Should doctors be allowed to mercilessly butcher innocent little babies?" Obviously the wording affects the response. It is often approximately 10% of the population is homosexual. That statistic has been used so many times that it has come to be accepted as fact by many people including the media. It is instructive to look at the study from which those numbers were derived. In the 1940s, Dr. Alfred Kinsey did a study at the University of Indiana. It is in this study that the oft repeated claim of 10% homosexuality originated. This study, however, was badly flawed. To take results of a sample and draw inferences about the entire population one must be sure that the sample is representative of the entire population. Kinsey used primarily volunteers for his study. He used second and third groups of volunteers, many of whom were referred by the first group. These facts by themselves make any comparison to the actual population highly suspect. In addition to this, however, is the fact that a disproportionate percentage of Kinsey's study was made up of prison inmates. One of the results of Kinsey's study was that about 10% of his sample was "more or less" exclusively homosexual for about three years of their lives. This is the statistic that people claim "proves" that 10% of the population is homosexual. A careful reading shows that not even Kinsey, with his poor methodology, supports the claim of 10% homosexuality. Kinsey's results only show four per cent of males and 1 per cent of females to be exclusively homosexual for more than three years of their lives. Between 1981 and 1984, three different studies (published by Indiana University Press, Journal of Psychology and Theology and Playboy Press) all came up with results very different from 10%. They found that 96% of the population consider themselves exclusively heterosexual. Between one and three per cent consider themselves exclusively homosexual. In a recent study, a group from the University of Chicago, reported that only one per cent of Americans identify themselves as homosexuals. In the July 3, 1992 issue of Science magazine, a study of human sexuality was reported which showed less than 4.1% of the male population had participated in homosexual intercourse at least once in their life. Only 1.1% of the male population had participated in homosexual intercourse in the past year. In the female population, the rate of lesbian activity was 2.6% for at least once in the lifetime and 0.3% in the past year. It is also often reported that heterosexuals are responsible for more cases of child abuse than homosexuals. This is true. But the intent of many who report that data is to try to make it seem that heterosexuals are more likely to be abusers. While homosexuals make up between one and two per cent of the population, they account for more than 33% of all child abuse. Thus, homosexuals are between 24 and 48 times more likely than heterosexuals to be child abusers. Another emotionally charged statistical claim is that men earn more than women. Many problems come up in such a statistic. Are they comparing similar jobs? Are they considering years in the job? Are men more likely to put in more hours because of women spending more time at home with their children? Some statements are worded so as to make them sound good. In 1948, it was reported "Today, electric power is available to more than 3/4 of U.S. farms." It could have been worded "Almost 1/4 of U.S. farms do not have electrical power available to them." Also, note that it said "available." It didn't say that they had it. Suppose I say, "Today, Mercedes Benz automobiles are available to more than 80% of the American public." This could just mean that more than 80% of the American public lives within 50 miles of a Mercedes dealer. Graphs of data, even though they contain no words, can also be very deceptive. Consider the graph below, which shows the production of widgets by two competing factories. Factory A XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Factory B XXX The graph seems to indicate that Factory A makes considerably more widgets than Factory B. The graph is deceptive because no labels are given. Here is the graph again, with labels. Factory A XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Factory B XXX 1950 2000 It now is clear that the difference in production between the two is not very significant. Another example of a deceptive graph was once used to demonstrate the amount of tax money collected by the federal government. A map of the United States was shown. States were shaded in. The taxes collected by the federal government were supposed to be equal to the amount of revenue produced by those states. [Keep in mind that this study was over 40 years ago, so it was before the large population shift to California and the southwest.] Shaded in were Washington, Oregon, California, Nevada, Idaho, Montana, Wyoming, Utah, Arizona, New Mexico, Colorado, North Dakota, South Dakota, Nebraska, Kansas, Oklahoma, Texas, Minnesota, Iowa and most of Missouri. That covered more than half of the country. The map could just as easily been done shading New York, Pennsylvania, New Jersey, Massachusetts and Connecticut. That shading covered very little of the country. Both were correct since the eastern states at that time, though much smaller, produced significantly more revenue than the western states. Polls can be used to show the opposite of the truth. Princeton's Office of Public Opinion Research once did a poll of people to determine racial attitudes. People were asked if they felt that blacks had as good a chance to get a job as a white. Other questions were asked to determine the racial attitudes of the person. Two-thirds of those who were sympathetic to blacks said that blacks had a poorer chance of getting a job. Two-thirds of those showing prejudice felt blacks had an equal chance of getting a job. Thus, the poll could be done during a time of relative racial harmony, then again during a time of racial strife. As racial attitudes worsened, the poll could show "Blacks have an increasing likelihood of being able to get jobs." The frame of reference is vital to proper interpretation of statistics. Suppose you have an investment. Last year you got a return of 3%. This year you have a return of 6%. You could say your return was 3% better than last year. You could say your return was 100% better than last year. Both statements are "true" though the certainly give a very different impression. Another example of poor use of statistics and polls is "Selective Reporting." In January 1995, the U.S. Agency for International Development (AID) reported (to great media response) that Americans support foreign aid. A Reader's Digest poll showed that 86% of the people don't know what AID does. It also showed that 67% want foreign aid cut. The same question was asked, in the context of cutting the deficit, and 83% said cut foreign aid. So where did AID get there information? A poll asked if the US should "share at least a small portion of its wealth with those in the world who are in great need." Over 80% said yes. They were asked if foreign aid should be cut a little, somewhat, a lot or "eliminate it entirely." 8% chose the last option. They took this to mean 92% "support" foreign aid. They ignored that 75% said there is "too much" foreign aid. "2 out of 3 doctors recommend XYZ toothpaste." How did they decide that? Easy. Find two dentists that recommend XYZ and then find one other dentist who doesn't. Polls show support for increasing spending on PBS and support for decreasing spending. If you want to get support for decreasing spending then emphasize the budget deficit. If you want to get support for increasing spending then emphasize violence on regular television. Polls show support for increasing spending on the National Endowment for the Arts and support for decreasing spending. If you want to get support for decreasing spending then emphasize the budget deficit and morals. If you want to get support for increasing spending then emphasize censorship and freedom of speech. Probability is another source of confusion with statistics. In basketball, if a player who typically makes 50% of his shots misses five in a row, we tend to think that something is badly wrong. However, the probability of that 50% shooter having at least 5 consecutive misses in any span of 11 shots is 255/2048 = .1245, almost 1 in 8. It is not at all rare. The probability that in a span of 6 sets of 11 shots, that he will miss at least 5 consecutive shots in one of those sets of 11 shots is 1 - (1 - 255/2048)6 = .5496, better than a 50-50 chance. If I flipped a coin and got heads 75% of the time the tendency might be to think that it must be a biased coin. The problem here is the need to know how many tosses are involved. If there were only 4 tosses then the probability of 3 heads is 1/4. With a "fair" coin, as the number of tosses gets "large," P(half heads) goes to 1/2. I wrote a computer program to simulate 10,000 tosses of a coin. I ran it 10 times. Here are the results. HEADS TAILS 4974 5026 5036 4964 4963 5037 5000 5000 5024 4976 4995 5005 5038 4962 4983 5017 4989 5011 5023 4977 50,025 49,975 The greatest disparity was in the 7th run. There were 76 more "heads" than "tails." But that only means 76/100 of 1% difference. Overall there is only a 1/20 of 1% difference. In standardized testing, percentiles are used to compare scores. Suppose in a collection of 300 students, a test is given with 200 possible points. Suppose John is on the 99-th percentile, Dave is on the 90-th percentile, Mike is on the 60-th percentile and Bill is on the 40-th percentile. Let J denote John's score, D Dave's score, etc. We might guess that J - D