Sampling Distributions

Introduction

Sampling distributions represent a troublesome topic for many students. However, they are important because they are the basis for making statistical inferences about a population from a sample.

One problem sampling distributions solve for us is to provide a basis for using samples to make inferences about populations.  Another problem we will solve with sampling distributions involves the change in our formula when we use a sample to test a hypothesis instead of a single x-value.  Recall the formula we have been using to solve z-score problems:

This formula uses the standard deviation in the denominator because it tells us how far x-values tend to vary within a distribution.  

The problem we have now is that we want to do hypothesis testing with a sample of values rather than a single x-value.  The standard deviation does not tell us anything about the variability of samples, but instead only individual x-values.  Sampling distributions will provide us with an explanation of how to measure variability in a sample, and thus the probability of observing a particular sample.

 

Sampling Distribution of the Mean

If I wanted to form a sampling distribution of the mean I would:

1. Sample repeatedly from the population
2. Calculate the statistic of interest (the mean)
3. Form a distribution based on the set of means I obtain from the samples
The set of means I obtain will form a new distribution- a sampling distribution. In this case, the sampling distribution of the mean. Take a look at the following demonstration for a visual representation.


In this example a small population of four values is represented. Every possible combination of values from the population is sampled to form a true sampling distribution of the mean. Note, however, that a sampling distribution of any true population would be much larger, thereby making their very nature theoretical rather than practically demonstrable.


This demonstration illustrates Rule 1 of the Central Limit Theorem:  The mean of the population and the mean of the sampling distribution of means will always have the same value.  This rule is important to hypothesis testing because when we go to test a hypothesis even though our sample will not be exactly like the population, on average it will be exactly the same.  We can be sure that repeated experiments will yield sample values close to the mean, and exactly the same over time.

Rule 2 of the Central Limit Theorem states:  the sampling distribution of the mean will be normal regardless of the shape of the population distribution. Whether the population distribution is normal, positively or negatively skewed, unimodal or bimodal in shape, the sampling distribution of the mean will have a normal shape. 

In the following example we start out with a uniform distribution. The sampling distribution of the mean, however, will contain variability in the mean values we obtain from sample to sample. Thus, the sampling distribution of the mean will have a normal shape, even though the population distribution does not. Notice that because we are taking a sample of values from all parts of the population, the mean of the samples will be close to the center of the population distribution.