Statistical sampling works

because of the **law of large numbers**.

Think of this as the statistician's answer to cost over-runs. The law of large numbers says that as the size of the sample increases, so does the chance it accurately reflects the whole.

If you don't think a random sample can represent a much larger group, look at the situation backwards. If you selected 1,600 people at random in the U.S. population, how likely is it that they would **not **represent the whole?

If the law is so great, why would you ever want to increase the sample size beyond 1,000 or so? (I'm thinking of polls and studies with thousands, or even hundreds of thousands, of respondents...). Because:

- You want to know about subgroups within the sample (women, or women smokers, or women smokers in New York City or Topeka, Kansas, for example). Remember, every time the sample size decreases, the accuracy does as well. Thus if we had only 25 women smokers in a sample of 1,600 people, the margin of error for that subgroup would be 20%, far below the 2.5% margin for the whole study.
- you were studying something rare, like the incidence of a rare disease, or whether a vaccine really works.
- The poll is an excuse for something unrelated to legitimate public opinion research, like making publicity for a product, organization, or cause. In these cases, the larger the sample, the merrier.

Reader Beware

A legitimate political poll should come with some information to help you assess it: the number of people contacted, when the poll was conducted, and the margin of error. The margin is typically phrased as "accurate to plus or minus 3 percentage points," for a true range of uncertainty of 6 percent.

That's All There Is To It?

Sorry. We need to discuss some limitations. First of all, one time in 20, the results can be outside the margin of error. So if you read a lot of polls, some of them will be wrong. It's part of the statistical game of chance.

More important, remember that the margin of error is only valid if the poll was **perfectly** designed and **perfectly** executed. That means no errors in writing the questionnaire. And it means making sure everybody was treated equally.

Want to read about statistical issues in medical epidemiology?

We've located the ultimate in randomness. Wanna check it out?