CCRC: Essays: Bad Use of Statistics and Polling

Bad Use of Statistics and Polling

by Dr. Fred Worth, Department of Mathematics, Henderson State University, posted May 29, 1996. Dr. Worth notes the much of the material for this paper came from the book "How to Lie with Statistics" and Reader's Digest.
     The world today almost seems to run on statistics, surveys and
polls.  Especially considering that this is an election year, it is
essential that people understand what statistics say and do not say
and when they are or are not reliable.  [note:  Much of the data
contained in this paper comes from the book "How to Lie with
Statistics."  The book was published over 40 years ago so some of
the data is outdated.]
 
     Statistics can cause great deal of worry.  We have all heard
"the average child walks/crawls/talks/etc. by the age of ... ."  A
mother hears that and thinks, "Oh no, my child doesn't walk yet,
what's wrong with him?"  Very likely, there is nothing wrong with
the child.  My son didn't walk until he was about 19 months.  He
didn't crawl until around the same time.  Was there something wrong
with him?  No, he was just very heavy and needed to develop more
strength than most kids do.  Variation is important to consider.
 
     IQ testing is another area that can cause a great deal of
stress, especially for parents.  Suppose you want to measure off a
football field.  Your stride is about 1 yard long.  So you step off
100 yards.  How accurate will it be?  Within 3 feet one way or
another?  Long by more than 3 feet?  Short by more than 3 feet?
Reasonable guesses (where P(A) denotes the probability that event
A happens) would be
          P(97  103).
     IQ tests (by themselves) are likely no more accurate than that
as a measure of intelligence.  All kinds of other factors may enter
in.  Background, health on the day of the test, what you read just
before you took the test, fatigue, etc.
 
     Suppose the statistics ARE accurate.  What do they prove?
Cigarette brand X has 50% less tar/nicotine/carbon monoxide/etc.
than the other leading brands.  Does that make them healthy?
 
     More doctors smoke brand X than any other brand.  That may
very well be true.  Does that mean that smoking brand X is a
healthy choice?
 
     A mechanical juicer was advertised as being able to extract
"26% more juice."  That sounds pretty good until one realizes that
the ad did not say "26% more juice" than what.  It turns out it was
26% more juice than a hand juicer.  It was not being compared to a
comparable product.
 
     More car accidents occur in clear weather than foggy weather.
We therefore conclude that it is safer to drive in the fog than in
clear weather, right?  Of course not.  Is the statistic false?  No.
How meaningful is the statistic.  Not very.  Consider that, in most
places, clear weather is much more common than foggy weather.
Consider also that when it is foggy, many people will avoid
driving.  Thus, it is to be expected that more accidents will occur
during clear weather than foggy weather.  A more useful statistic
to consider is NOT number of accidents but number of accidents per
mile driven.
 
     It was reported that the death rate in the Navy during the
Spanish-American war was 9 per 1000.  The death rate in New York
City during the Spanish-American war was 16/1000.  Are we then to
conclude, that during the late 19th century, it was safer to be in
the Navy during a war than it was to live in New York City?  That
is simply absurd.  The two statistics do not compare the same kinds
of populations.  The Navy would have young, healthy adults.  New
York would have a far more diverse population.  It would include
infants, the elderly, the ill.  All of those populations have much
higher mortality rates than young, healthy adults.
 
     Arkansas ranks 49th in ... .  We hear statistics like that all
of the time.  The intended reaction is "Oh, no.  We have to
improve" whatever it is that was being measured.  Consider,
however, that no matter what is being measured, if you have 50
states being ranked, one of them must be 49th.  That of itself does
not mean that any of the states are deficient.  "Joe Smith is the
paid major league baseball player."  Does that mean Joe is ready
for the poverty line?  Consider that all major league players make
more than $100,000 per year.  That hardly qualifies them for public
assistance.
 
     A study once showed that students who smoke have lower grades.
Some people concluded from that data that smoking causes lower
grades.  Such a conclusion is not valid without considerably more
evidence.  The problem here is distinguishing between causation and
correlation.  Simply having two events happen together does NOT
mean either one necessarily caused the other.
 
     There are more weddings in June than any other month.  There
are more suicides in June than any other month.  Therefore, we
conclude that weddings cause suicide.  Again, we have an absurd
conclusion.  How many of the suicides were people who recently
married or knew someone who recently married?  Is it possible that
other factors could better explain the data?  It would be just as
silly to say that since June has more weddings and that June has
only 30 days, that all of the weddings cause June to have fewer
days.
 
     Cornell did a study (approx. 1950) of middle age alumni.  It
showed that 93% of male graduates were married and 65% of female
graduates were married.  Female college graduates were three times
as likely to be unmarried as female non-college graduates.
Therefore, some people concluded that graduating from college makes
it less likely a woman will marry.  Such a conclusion ignores vital
information.  Isn't it possible that those women who married during
their college careers were more likely to drop out to pursue family
goals rather than educational or career goals?
 
     Another vital conern is how reliable the data are.
 
     Suppose I ask 100 people how much they slept last night.  I
add up their answers and get 783.1 hours.  I report the average
person slept 7.831 hours.  Sounds like a precise study.  It is
worthless.  Do you really know how long you slept last night?
 
     Around 1949, Yale took a survey of their graduates.  The
survery showed that the "average" salary of Yale grads was $25,111
per year (a princely sum in those days).  What does this mean?
Does it prove that Yale graduates make more money than those of
other colleges, or more than people in general?  Actually, no, it
doesn't.
 
     Several questions must be considered.  First, what "average"
was used?  There are three frequently used statistics that are
often called averages.  First is the mean.  It is what most people
think of when they think of an average.  The mean is obtained by
adding up the numbers and dividing by the number in the sample.
For example the mean of 5, 6, 6, 7 and 11 is (5 + 6 + 6 + 7 + 11)/5
or 7.  Second, is the median.  The median measures the middle
value.  The median of 5, 6, 6, 7 and 11 is 6.  Lastly, the mode
denotes the value that occurs most frequently.  The mode of 5, 6,
6, 7 and 11 is 6.  Each of these "averages" are useful but all can
be deceptive.
 
     Back to the Yale salary study.  If "average" implied the mean,
consider the following possibility.  Suppose 100 graduates answered
the survey, one made $2,000,000 and one made $500,000.  Then the
other 98 "average" $113 per year.
 
     If the mode was used, it is possible that 2 graduates made
$25,111 while all the others made under $1000 per year.
 
 
     In general, the median is a better measure for populations
with possibly wide variations, like salary.  But even if the median
is used, is this useful information?  How was the study done?  It
was done by mail and depended on the respondent giving his/her
salary.  This creates several problems.  Are people honest?  They
may lie, giving a higher salary (to look good) or giving a lower
salary (lied to IRS so they want to stay consistent).  They may not
know how much they make so they may have guessed.  It is highly
probable that they did not get all of the living graduates to
respond since some would have whereabouts unknown (likely
unsuccessful) or others would not want to respond (likely
unsuccessful).  These various problems, unless addressed somehow,
make the study relatively useless.
 
     Suppose I want to do a study of the average height of people
who play miniature golf at a particular miniature golf course.  I
do the mean and get a mean height of 4'8" and a median height of
4'8".  I thus conclude that most of the people there that day are
around 4'8".  Good conclusion?  Well, it turns out that it was
father-son day at the local day care.  So I found 24 fathers with
their 3 year old kids and one 10 year old brother.
 
     Variance also is a valuable tool when doing statistics but
then it reporting results starts getting so complicated that the
general public (and the media reporting the statistics) is not
going to understand whether the information is valid or not.
 
     It is sort of like the guy who had his head in a hot oven and
his feet in liquid nitrogen who said "on the average I feel fine."
 
     A classic example of a bad use of statistics occured during
the presidential election of 1936.  Literary Digest magazine did a
presidential election poll in 1932.  Their results were very
accurate.  So they did it again in 1936 using the same methodology.
Their results showed Alf Landon would win over Franklin Roosevelt
by an electoral vote margin of 370 - 161.  The actual results were
a little different.  Roosevelt won 523 - 8.
 
     What happened?  They did their survey via a telephone poll of
their subscribers.  Why would that be a problem?  What happened in
1929?  The stock market crashed and the depression began.  By 1932,
things were getting bad but not too bad.  By 1936, things were
getting really bad.  People gave up "unnecessary" things like
magazine subscriptions and phones.  Who still had magazine
subscriptions and phones?  Rich people.  In the 1930s rich people
tended to be Republicans so it was natural that a majority of those
polled would vote for the Republican candidate.
 
     Another problem to consider is emotion.  Whenever you deal
with emotionally charged issues you need to be especially careful.
 
     Abortion is a prime example.  Both pro-abortion groups and
pro-life groups can show polls that show that the majority of the
public supports their view.  How do we word those questions?  If I
want to show support for a pro-abortion view, I would ask "Should
women be allowed to make their own medical decisions without the
intrusion of an overbearing federal bureaucracy?"  If I want to
show support for a pro-life view, I would ask "Should doctors be
allowed to mercilessly butcher innocent little babies?"  Obviously
the wording affects the response.
 
     It is often approximately 10% of the population is homosexual.
That statistic has been used so many times that it has come to be
accepted as fact by many people including the media.  It is
instructive to look at the study from which those numbers were
derived.
 
     In the 1940s, Dr. Alfred Kinsey did a study at the University
of Indiana.  It is in this study that the oft repeated claim of 10%
homosexuality originated.  This study, however, was badly flawed.
To take results of a sample and draw inferences about the entire
population one must be sure that the sample is representative of
the entire population.  Kinsey used primarily volunteers for his
study.  He used second and third groups of volunteers, many of whom
were referred by the first group.  These facts by themselves make
any comparison to the actual population highly suspect.  In
addition to this, however, is the fact that a disproportionate
percentage of Kinsey's study was made up of prison inmates.  One of
the results of Kinsey's study was that about 10% of his sample was
"more or less" exclusively homosexual for about three years of
their lives.  This is the statistic that people claim "proves" that
10% of the population is homosexual.  A careful reading shows that
not even Kinsey, with his poor methodology, supports the claim of
10% homosexuality.  Kinsey's results only show four per cent of
males and 1 per cent of females to be exclusively homosexual for
more than three years of their lives.
 
     Between 1981 and 1984, three different studies (published by
Indiana University Press, Journal of Psychology and Theology and
Playboy Press) all came up with results very different from 10%.
They found that 96% of the population consider themselves
exclusively heterosexual.  Between one and three per cent consider
themselves exclusively homosexual.
 
     In a recent study, a group from the University of Chicago,
reported that only one per cent of Americans identify themselves as
homosexuals.
 
     In the July 3, 1992 issue of Science magazine, a study of
human sexuality was reported which showed less than 4.1% of the
male population had participated in homosexual intercourse at least
once in their life.  Only 1.1% of the male population had
participated in homosexual intercourse in the past year.  In the
female population, the rate of lesbian activity was 2.6% for at
least once in the lifetime and 0.3% in the past year.
 
     It is also often reported that heterosexuals are responsible
for more cases of child abuse than homosexuals.  This is true.  But
the intent of many who report that data is to try to make it seem
that heterosexuals are more likely to be abusers.  While
homosexuals make up between one and two per cent of the
population, they account for more than 33% of all child abuse.
Thus, homosexuals are between 24 and 48 times more likely than
heterosexuals to be child abusers.
 
     Another emotionally charged statistical claim is that men earn
more than women.  Many problems come up in such a statistic.  Are
they comparing similar jobs?  Are they considering years in the
job?  Are men more likely to put in more hours because of women
spending more time at home with their children?
 
     Some statements are worded so as to make them sound good.  In
1948, it was reported "Today, electric power is available to more
than 3/4 of U.S. farms."  It could have been worded "Almost 1/4 of
U.S. farms do not have electrical power available to them."  Also,
note that it said "available."  It didn't say that they had it.
 
     Suppose I say, "Today, Mercedes Benz automobiles are available
to more than 80% of the American public."  This could just mean
that more than 80% of the American public lives within 50 miles of
a Mercedes dealer.
 
     Graphs of data, even though they contain no words, can also be
very deceptive.  Consider the graph below, which shows the
production of widgets by two competing factories.
 
      Factory A  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
      Factory B  XXX
 
The graph seems to indicate that Factory A makes considerably more
widgets than Factory B.  The graph is deceptive because no labels
are given.  Here is the graph again, with labels.
 
      Factory A  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
      Factory B  XXX
                 1950                                        2000
 
It now is clear that the difference in production between the two
is not very significant.
 
     Another example of a deceptive graph was once used to
demonstrate the amount of tax money collected by the federal
government.  A map of the United States was shown.  States were
shaded in.  The taxes collected by the federal government were
supposed to be equal to the amount of revenue produced by those
states.  [Keep in mind that this study was over 40 years ago, so it
was before the large population shift to California and the
southwest.]  Shaded in were Washington, Oregon, California, Nevada,
Idaho, Montana, Wyoming, Utah, Arizona, New Mexico, Colorado, North
Dakota, South Dakota, Nebraska, Kansas, Oklahoma, Texas, Minnesota,
Iowa and most of Missouri.  That covered more than half of the
country.  The map could just as easily been done shading New York,
Pennsylvania, New Jersey, Massachusetts and Connecticut.  That
shading covered very little of the country.  Both were correct
since the eastern states at that time, though much smaller,
produced significantly more revenue than the western states.
 
     Polls can be used to show the opposite of the truth.
Princeton's Office of Public Opinion Research once did a poll of
people to determine racial attitudes.  People were asked if they
felt that blacks had as good a chance to get a job as a white.
Other questions were asked to determine the racial attitudes of the
person.  Two-thirds of those who were sympathetic to blacks said
that blacks had a poorer chance of getting a job.  Two-thirds of
those showing prejudice felt blacks had an equal chance of getting
a job.  Thus, the poll could be done during a time of relative
racial harmony, then again during a time of racial strife.  As
racial attitudes worsened, the poll could show "Blacks have an
increasing likelihood of being able to get jobs."
 
     The frame of reference is vital to proper interpretation of
statistics.  Suppose you have an investment.  Last year you got a
return of 3%.  This year you have a return of 6%.  You could say
your return was 3% better than last year.  You could say your
return was 100% better than last year.  Both statements are "true"
though the certainly give a very different impression.
 
     Another example of poor use of statistics and polls is
"Selective Reporting."  In January 1995, the U.S. Agency for
International Development (AID) reported (to great media response)
that Americans support foreign aid.  A Reader's Digest poll showed
that 86% of the people don't know what AID does.  It also showed
that 67% want foreign aid cut.  The same question was asked, in the
context of cutting the deficit, and 83% said cut foreign aid.  So
where did AID get there information?  A poll asked if the US should
"share at least a small portion of its wealth with those in the
world who are in great need."  Over 80% said yes.  They were asked
if foreign aid should be cut a little, somewhat, a lot or
"eliminate it entirely."  8% chose the last option.  They took this
to mean 92% "support" foreign aid.  They ignored that 75% said
there is "too much" foreign aid.
 
     "2 out of 3 doctors recommend XYZ toothpaste."  How did they
decide that?  Easy.  Find two dentists that recommend XYZ and then
find one other dentist who doesn't.
 
     Polls show support for increasing spending on PBS and support
for decreasing spending.  If you want to get support for decreasing
spending then emphasize the budget deficit.  If you want to get
support for increasing spending then emphasize violence on regular
television.
 
     Polls show support for increasing spending on the National
Endowment for the Arts and support for decreasing spending.  If you
want to get support for decreasing spending then emphasize the
budget deficit and morals.  If you want to get support for
increasing spending then emphasize censorship and freedom of
speech.
 
     Probability is another source of confusion with statistics.
In basketball, if a player who typically makes 50% of his shots
misses five in a row, we tend to think that something is badly
wrong.  However, the probability of that 50% shooter having at
least 5 consecutive misses in any span of 11 shots is 255/2048 =
.1245, almost 1 in 8.  It is not at all rare.
 
     The probability that in a span of 6 sets of 11 shots, that he
will miss at least 5 consecutive shots in one of those sets of 11
shots is 1 - (1 - 255/2048)6 = .5496, better than a 50-50 chance.
 
     If I flipped a coin and got heads 75% of the time the tendency
might be to think that it must be a biased coin.  The problem here
is the need to know how many tosses are involved.  If there were
only 4 tosses then the probability of 3 heads is 1/4.  With a
"fair" coin, as the number of tosses gets "large," P(half heads)
goes to 1/2.  I wrote a computer program to simulate 10,000 tosses
of a coin.  I ran it 10 times.  Here are the results.
 
               HEADS          TAILS
               4974           5026
               5036           4964
               4963           5037
               5000           5000
               5024           4976
               4995           5005
               5038           4962
               4983           5017
               4989           5011
               5023           4977
             50,025         49,975
 
The greatest disparity was in the 7th run.  There were 76 more
"heads" than "tails."  But that only means 76/100 of 1% difference.
Overall there is only a 1/20 of 1% difference.
 
In standardized testing, percentiles are used to compare scores.
Suppose in a collection of 300 students, a test is given with 200
possible points.  Suppose John is on the 99-th percentile, Dave is
on the 90-th percentile, Mike is on the 60-th percentile and Bill
is on the 40-th percentile.  Let J denote John's score, D Dave's
score, etc.  We might guess that J - D
Go Back
CCRC Home Page
E-Mail