Quality Progress - February 2018 - 54

Statistics Spotlight
Statistical application

FIGURE 1

The theorem is derived under the
assumption of an infinite population with observations that are
independent, and identically distributed with constant mean and
variance. Using basic calculus, it is
not difficult to prove and is often
included in high school curricula.
Further, for this infinite population with mean, μ, and standard
deviation, σ, there is the added
assumption that for our sample to
be normally distributed we must
take sufficiently large random
samples from the population with
replacement. What de Moivre
showed was that this will hold true

Uniform distribution of the numbers 1-10
and mean 5.5
0.12
0.10
0.08
0.06
0.04
0.02
0

TA B L E   1

Means of 3 from a
uniform distribution of
the numbers 1-10
Take samples of size 3
1

2

3

2.0

1

2

4

2.3

1

2

5

2.7

1

2

6

3.0

1

2

7

3.3

1

2

8

3.7

1

2

9

4.0

1

2

10

4.3

...

Average

...

...

...

6

8

9

7.7

6

8

10

8.0

6

9

10

8.3

7

8

9

8.0

7

8

10

8.3

7

9

10

8.7

8

9

10

9.0

1

2

3

4

5

6

7

8

9

regardless of the distribution of the source population.
In its most familiar form, this theorem does not apply to sampling
from a finite population-for example, the number of factories an
organization owns or the number of transit subway riders per day.4 Two
important modifications of the CLT were necessary before statisticians
could apply the results to finite populations and sampling without
replacement. Andrey Markov showed that the theorem can be relaxed
for use with dependent sampling (without replacement) and Lévy
showed that the same properties of the CLT with theoretical distribution can be applied to empirical distributions (that is, real data). 5
In general, statisticians assume that whether the underlying distribution is normal or skewed, provided the sample size is sufficiently
large (usually n > 30), the sample will be normal. If the population is
already normal, the theorem holds true even for samples smaller than
30. In practice, this means we can use the normal probability model to
quantify uncertainty when making inferences about a population mean
based on the sample mean.
However, the essential component of the CLT is that it is referring to
the distribution of our sample means approaching the normal distribution, and the mean of our sample means will be the same as the
population mean, not a specific mean from one specific sample-as
how the CLT is used today.
We are now analyzing large data sets from nonrandomized and
from samples without replacement. The CLT, while very generalizable,
was developed before the advent of computers and age of big data.
Now, it's too easy to have too much data and therefore be magnitudes

https://www.nxtbookmedia.com