A Guide for the statistically perplexed

The task of polling is to get information from a few people and use it to learn about the larger population. (In political polling, the population is usually all likely voters.) Why not just ask everybody? Because sampling is cheaper.

But how big a sample do you need?

- The desire for accuracy says "as large as you can afford."
- But the bottom line says "as small as you can poll accurately."

Jessica Utts, a statistics expert at University of California at Davis, says you can usually estimate the margin of error by finding the square root of the sample size (n), then dividing 1 by that number:

What is the margin of error if we sample 1,600 people?

The square root of 1,600 = 40, and 1/40 = .025, or 2.5%. Thus the margin of error is 2.5%.

Because the statistic can be smaller or larger than the true amount, we say "the margin of error is plus or minus 2.5%." The reported proportion plus or minus the margin of error is called the confidence interval (defined).

If (and **only **if) you're interested in** scientific (non-political) statistics**, read here.
In testing a scientific hypothesis (defined) the goal is to find out if the results could be due to chance. When the statistics say there's only a 5% probability of getting such convincing results if chance alone is responsible, we call these results statistically significant. (You may see the following gibberish to indicate significance: "p (defined) <.05".)

As you can see from the graph, the first increases in sample size produce the biggest benefits:

- going from 100 to 1,000 will decrease your margin of error from 10% to 3%.
- But going from 1,000 to 10,000 cases will only reduce the margin to 1%. So a 10-fold increase in sampling expense gives us a puny (defined) increase in accuracy.

Why can this simple calculation predict the accuracy of a statistic?