7.3 Distribution of the Sample Mean; Central Limit Theorem
In section 7.1, we looked at the expected value and variance of the sample
mean
. Now, we'll
look at the type of distribution that
has.
We'll use two important results about normal random variables.
-
If X is a normal r. v. with mean m, variance
s2, and c is a constant, then Y =
cX is normal, with mean cm, variance c2s2.
-
If X1 and X2 are independent normal random
variables with means m1 and m2
and variances s12 and
s22, then X1+X2
is normally distributed,with mean m1+m2,
variance s12+s22.
-- the info about means and variances is old hat; what’s new is fact that
cX & X1+X2 are normal.
To summarize:
-
a constant times a normal r.v. is normal
-
the sum of normals is normal.
The proofs of these assertions use properties of moment generating functions,
in particular the following:
if X & Y have the same moment generating function, mX(t)
= mY(t),
then X & Y have the same distribution (density function).
Distribution of the sample mean
From the above two properties it follows that:
if independent random variables X1, X2, ...,
Xn are a random sample from a normal distribution with mean
m, variance s2,
then the sample mean
is normally distributed with mean m
= m and
variance s2
= s2/n .
Thus if the original population has a normal distribution, the the sample
means from samples of some (fixed) size n will also be normally distributed,
with the same mean but a smaller variance (and standard deviation). The
density curve for the sample means will thus be bell-shaped, and centered
at the same location as the density curve for the population, but will
be narrower.
Using the information about the distribution of
ex:
Raising trout in a fish hatchery; suppose lengths (of 2 year olds)
are normally distributed, with mean m = 7.3
inches, standard deviation s = 1.6 inches.
Take samples of size 70, and compute the sample mean
for each sample; then the sample means
will be normally distributed, with mean m
= 7.3 inches and standard deviation s
= 1.6/sqrt(70) = .19 inches.
Graphs of densities: the density graph for the sample means from
the various samples will be a normal curve, centered at the same location
(7.3 inches) as the population density curve, but will be much narrower:
the standard deviation of the sample means is just .19, while that for
the population is 1.6. Thus the values of the sample mean for various samples
will be centered at the population mean, but will tend to vary much less
on either side than individuals from the population.
By the normal probability rule, since
is normally distributed,
the probability that
will lie within 2 standard deviations of its mean is .95
i.e.,
the probability that
will lie within .38 units of the population mean is .95
(since the standard deviation of
is .19, and the mean of
is the same as the population mean)
i.e.,
95% of samples we choose will have
lying within .38 units of the population mean
Thus:
For 95% of the samples we draw,
will lie within .38" of the value of m
This gives us an idea of how reliable the value of
from a sample will be as an estimate of the value of m:
95% of the time, the population mean m will
be within .38 units of the value found for the sample mean
.
This is the fundamental idea behind confidence intervals (discussed in
the next section).
Central Limit Theorem
Let X1, X2, ..., Xn be a random sample
from a population having any distribution (with mean m,
variance s2), not necessarily a normal
distribution; then for large n, the sample mean
will be approximately normally distributed (with mean m,
variance s2/n ).
-
this is surprising: regardless of distribution of population, distribution
of sample mean will be approximately normal for large n!!
-
thus can use normal calculations with the sample mean even when distribution
of population isn’t normal
-
proof involves looking at moment generating functions
Previous section Next
section