In Chapter 3, we learned how to evaluate estimators (Bias, Efficiency) and construct Confidence Intervals. However, nearly all our calculations relied on a crucial assumption:
“Assume the population is Normally Distributed.”
But real data is often skewed, discrete, or just weird. What do we do then?
Callback to Last Semester (S2)
You have actually already seen the solution! Recall the normal approximations from S2:
Binomial → \to → Normal: If X ∼ B ( n , p ) X \sim B(n, p) X ∼ B ( n , p ) and n n n is large, X ≈ N ( n p , n p q ) X \approx N(np, npq) X ≈ N ( n p , n pq ) .
Poisson → \to → Normal: If X ∼ P o ( λ ) X \sim Po(\lambda) X ∼ P o ( λ ) and λ \lambda λ is large, X ≈ N ( λ , λ ) X \approx N(\lambda, \lambda) X ≈ N ( λ , λ ) .
These weren’t just random lucky coincidences. They were specific instances of a much more powerful universal law: the Central Limit Theorem (CLT) .
In this chapter, we generalize those S2 approximations to any distribution.
Goal 1: Use CLT to perform inference on a single sample mean from any distribution.
Goal 2: Use CLT to compare two sample means from different distributions.
Central Limit Theorem (informal)
If we take a large random sample of size n n n from any population with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 ,
then the sample mean X ˉ \bar{X} X ˉ is approximately normally distributed with
X ˉ ≈ N ( μ , σ 2 n ) , \bar{X} \approx N\!\left(\mu, \frac{\sigma^2}{n}\right), X ˉ ≈ N ( μ , n σ 2 ) , no matter what the original population looks like (as long as it has finite variance).
Theorem: Central Limit Theorem
Let X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X 1 , X 2 , … , X n be independent and identically distributed random variables with
E [ X i ] = μ , V a r ( X i ) = σ 2 < ∞ . E[X_i] = \mu, \qquad \mathrm{Var}(X_i) = \sigma^2 < \infty. E [ X i ] = μ , Var ( X i ) = σ 2 < ∞.
Then as n → ∞ n \to \infty n → ∞ ,
Z = X ˉ − μ σ / n ⟹ N ( 0 , 1 ) , Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \;\Longrightarrow\; N(0, 1), Z = σ / n X ˉ − μ ⟹ N ( 0 , 1 ) ,
that is, the distribution of Z Z Z tends to the standard normal distribution.
For large n n n this gives the useful approximation
X ˉ ≈ N ( μ , σ 2 n ) . \bar{X} \approx N\!\left(\mu, \frac{\sigma^2}{n}\right). X ˉ ≈ N ( μ , n σ 2 ) .
When Can We Use CLT?
Independence: X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X 1 , X 2 , … , X n should be independent.
Identical distribution: all X i X_i X i come from the same population.
Finite variance: σ 2 \sigma^2 σ 2 should be finite (no extremely heavy tails).
Sample size large: rules of thumb: n ≥ 30 n \ge 30 n ≥ 30 is often enough; more may be needed if the population is very skewed.
Example: Discrete General Distribution
Consider a highly volatile asset. Its annual return R R R follows a discrete distribution:
Loss (-10%): Probability 0.2 0.2 0.2
Break-even (0%): Probability 0.5 0.5 0.5
Gain (+20%): Probability 0.3 0.3 0.3
This distribution is discrete and not symmetric.
Task: Suppose you hold a portfolio of n = 50 n=50 n = 50 such independent assets. What is the probability that your average return is greater than 5%?
Step 1: Calculate Population Parameters (μ , σ 2 \mu, \sigma^2 μ , σ 2 )
First, we analyze the single asset R R R .
E [ R ] = ( − 10 × 0.2 ) + ( 0 × 0.5 ) + ( 20 × 0.3 ) = − 2 + 0 + 6 = 4 % E[R] = (-10 \times 0.2) + (0 \times 0.5) + (20 \times 0.3) = -2 + 0 + 6 = 4\% E [ R ] = ( − 10 × 0.2 ) + ( 0 × 0.5 ) + ( 20 × 0.3 ) = − 2 + 0 + 6 = 4%
E [ R 2 ] = ( ( − 10 ) 2 × 0.2 ) + ( 0 2 × 0.5 ) + ( 20 2 × 0.3 ) = ( 100 × 0.2 ) + 0 + ( 400 × 0.3 ) = 20 + 120 = 140 E[R^2] = ((-10)^2 \times 0.2) + (0^2 \times 0.5) + (20^2 \times 0.3) = (100 \times 0.2) + 0 + (400 \times 0.3) = 20 + 120 = 140 E [ R 2 ] = (( − 10 ) 2 × 0.2 ) + ( 0 2 × 0.5 ) + ( 2 0 2 × 0.3 ) = ( 100 × 0.2 ) + 0 + ( 400 × 0.3 ) = 20 + 120 = 140
Var ( R ) = E [ R 2 ] − ( E [ R ] ) 2 = 140 − 4 2 = 124 \text{Var}(R) = E[R^2] - (E[R])^2 = 140 - 4^2 = 124 Var ( R ) = E [ R 2 ] − ( E [ R ] ) 2 = 140 − 4 2 = 124
So, the population has μ = 4 \mu = 4 μ = 4 and σ 2 = 124 \sigma^2 = 124 σ 2 = 124 .
Step 2: Apply CLT to the Sample Mean R ˉ \bar{R} R ˉ
Since n = 50 n=50 n = 50 is large, the average return R ˉ \bar{R} R ˉ follows:
R ˉ ∼ N ( μ , σ 2 n ) = N ( 4 , 124 50 ) = N ( 4 , 2.48 ) \bar{R} \sim N\left(\mu, \frac{\sigma^2}{n}\right) = N\left(4, \frac{124}{50}\right) = N(4, 2.48) R ˉ ∼ N ( μ , n σ 2 ) = N ( 4 , 50 124 ) = N ( 4 , 2.48 )
Standard Deviation of R ˉ = 2.48 ≈ 1.575 \bar{R} = \sqrt{2.48} \approx 1.575 R ˉ = 2.48 ≈ 1.575 .
Step 3: Calculate Probability
We want P ( R ˉ > 5 ) P(\bar{R} > 5) P ( R ˉ > 5 ) . Standardize:
Z = 5 − 4 1.575 = 1 1.575 ≈ 0.635 Z = \frac{5 - 4}{1.575} = \frac{1}{1.575} \approx 0.635 Z = 1.575 5 − 4 = 1.575 1 ≈ 0.635
Using standard normal tables:
P ( Z > 0.635 ) = 1 − P ( Z < 0.635 ) ≈ 1 − 0.737 = 0.263 P(Z > 0.635) = 1 - P(Z < 0.635) \approx 1 - 0.737 = 0.263 P ( Z > 0.635 ) = 1 − P ( Z < 0.635 ) ≈ 1 − 0.737 = 0.263
Conclusion: Even though individual assets have a discrete, “jumpy” distribution, the portfolio average behaves normally. There is a ~26.3% chance the portfolio beats 5%.
Under the CLT, when n n n is large,
X ˉ ≈ N ( μ , σ 2 n ) . \bar{X} \approx N\!\left(\mu, \frac{\sigma^2}{n}\right). X ˉ ≈ N ( μ , n σ 2 ) .
If σ \sigma σ is unknown we estimate it with the sample standard deviation S S S and approximate
X ˉ ≈ N ( μ , S 2 n ) . \bar{X} \approx N\!\left(\mu, \frac{S^2}{n}\right). X ˉ ≈ N ( μ , n S 2 ) .
Definition: Estimated Standard Error of the Mean
For a large sample of size n n n , the estimated standard error of the sample mean is
S E ( X ˉ ) = S n , \mathrm{SE}(\bar{X}) = \frac{S}{\sqrt{n}}, SE ( X ˉ ) = n S ,
where S S S is the sample standard deviation.
Using the CLT, for large n n n we have approximately
Z = X ˉ − μ S / n ≈ N ( 0 , 1 ) . Z = \frac{\bar{X} - \mu}{S/\sqrt{n}} \approx N(0, 1). Z = S / n X ˉ − μ ≈ N ( 0 , 1 ) .
Therefore a 100 ( 1 − α ) % 100(1-\alpha)\% 100 ( 1 − α ) % confidence interval for μ \mu μ is
X ˉ ± z ∗ ⋅ S n , \bar{X} \pm z^* \cdot \frac{S}{\sqrt{n}}, X ˉ ± z ∗ ⋅ n S ,
where z ∗ z^* z ∗ satisfies P ( − z ∗ < Z < z ∗ ) = 1 − α P(-z^* < Z < z^*) = 1 - \alpha P ( − z ∗ < Z < z ∗ ) = 1 − α for Z ∼ N ( 0 , 1 ) Z \sim N(0,1) Z ∼ N ( 0 , 1 ) .
Confidence level α \alpha α z ∗ z^* z ∗ 90% 0.10 1.645 95% 0.05 1.96 99% 0.01 2.576
To test
H 0 : μ = μ 0 against H 1 : μ ≠ μ 0 , H_0: \mu = \mu_0 \quad\text{against}\quad H_1: \mu \ne \mu_0, H 0 : μ = μ 0 against H 1 : μ = μ 0 ,
with a large sample and unknown σ \sigma σ , we use the test statistic
Z = X ˉ − μ 0 S / n ≈ N ( 0 , 1 ) under H 0 . Z = \frac{\bar{X} - \mu_0}{S/\sqrt{n}} \approx N(0, 1) \quad\text{under } H_0. Z = S / n X ˉ − μ 0 ≈ N ( 0 , 1 ) under H 0 .
We reject H 0 H_0 H 0 if ∣ Z ∣ |Z| ∣ Z ∣ is too large (outside the critical region determined by the chosen significance level).
Suppose we have two populations:
Population A with mean μ A \mu_A μ A and variance σ A 2 \sigma_A^2 σ A 2
Population B with mean μ B \mu_B μ B and variance σ B 2 \sigma_B^2 σ B 2
We take independent random samples:
X 1 , … , X n A from population A , Y 1 , … , Y n B from population B , X_1, \ldots, X_{n_A} \quad\text{from population A}, \qquad
Y_1, \ldots, Y_{n_B} \quad\text{from population B}, X 1 , … , X n A from population A , Y 1 , … , Y n B from population B ,
and form the sample means X ˉ \bar{X} X ˉ and Y ˉ \bar{Y} Y ˉ .
If both sample sizes are large, CLT gives
X ˉ ≈ N ( μ A , σ A 2 n A ) , Y ˉ ≈ N ( μ B , σ B 2 n B ) , \bar{X} \approx N\!\left(\mu_A, \frac{\sigma_A^2}{n_A}\right),
\qquad
\bar{Y} \approx N\!\left(\mu_B, \frac{\sigma_B^2}{n_B}\right), X ˉ ≈ N ( μ A , n A σ A 2 ) , Y ˉ ≈ N ( μ B , n B σ B 2 ) ,
and, because the samples are independent,
X ˉ − Y ˉ ≈ N ( μ A − μ B , σ A 2 n A + σ B 2 n B ) . \bar{X} - \bar{Y} \approx N\!\left(\mu_A - \mu_B,
\frac{\sigma_A^2}{n_A} + \frac{\sigma_B^2}{n_B}\right). X ˉ − Y ˉ ≈ N ( μ A − μ B , n A σ A 2 + n B σ B 2 ) .
When the population variances are unknown we estimate them by the sample variances S A 2 S_A^2 S A 2 and S B 2 S_B^2 S B 2 and use the estimated standard error
S E ( X ˉ − Y ˉ ) = S A 2 n A + S B 2 n B . \mathrm{SE}(\bar{X} - \bar{Y}) =
\sqrt{\frac{S_A^2}{n_A} + \frac{S_B^2}{n_B}}. SE ( X ˉ − Y ˉ ) = n A S A 2 + n B S B 2 .
A large-sample 100 ( 1 − α ) % 100(1-\alpha)\% 100 ( 1 − α ) % confidence interval for μ A − μ B \mu_A - \mu_B μ A − μ B is
( X ˉ − Y ˉ ) ± z ∗ ⋅ S E ( X ˉ − Y ˉ ) . (\bar{X} - \bar{Y}) \pm z^* \cdot \mathrm{SE}(\bar{X} - \bar{Y}). ( X ˉ − Y ˉ ) ± z ∗ ⋅ SE ( X ˉ − Y ˉ ) .
To test
H 0 : μ A − μ B = Δ 0 H_0: \mu_A - \mu_B = \Delta_0 H 0 : μ A − μ B = Δ 0
against one- or two-sided alternatives, we use
Z = ( X ˉ − Y ˉ ) − Δ 0 S E ( X ˉ − Y ˉ ) ≈ N ( 0 , 1 ) under H 0 Z = \frac{(\bar{X} - \bar{Y}) - \Delta_0}{\mathrm{SE}(\bar{X} - \bar{Y})}
\approx N(0,1) \quad\text{under } H_0 Z = SE ( X ˉ − Y ˉ ) ( X ˉ − Y ˉ ) − Δ 0 ≈ N ( 0 , 1 ) under H 0
for large samples.
Example: Skewed vs Uniform Distributions
An engineer compares an old server (A) with a new server (B).
Server A (Old): Latency is Skewed . Most requests are fast, but some hang.
μ A = 205 ms , σ A = 50 ms \mu_A = 205 \text{ ms}, \quad \sigma_A = 50 \text{ ms} μ A = 205 ms , σ A = 50 ms
Server B (New): Latency is Uniformly Distributed between 150ms and 210ms.
X B ∼ U [ 150 , 210 ] X_B \sim U[150, 210] X B ∼ U [ 150 , 210 ]
We collect n A = 100 n_A = 100 n A = 100 requests from A and n B = 100 n_B = 100 n B = 100 from B.
Question: What is the probability that the sample mean of A is at least 20ms slower (higher) than B? i.e., P ( X ˉ A − X ˉ B > 20 ) P(\bar{X}_A - \bar{X}_B > 20) P ( X ˉ A − X ˉ B > 20 ) .
Step 1: Determine Parameters for B
For Uniform [ a , b ] [a, b] [ a , b ] :
μ B = a + b 2 = 150 + 210 2 = 180 ms \mu_B = \frac{a+b}{2} = \frac{150+210}{2} = 180 \text{ ms} μ B = 2 a + b = 2 150 + 210 = 180 ms
σ B 2 = ( b − a ) 2 12 = ( 60 ) 2 12 = 3600 12 = 300 \sigma_B^2 = \frac{(b-a)^2}{12} = \frac{(60)^2}{12} = \frac{3600}{12} = 300 σ B 2 = 12 ( b − a ) 2 = 12 ( 60 ) 2 = 12 3600 = 300
Step 2: Distribution of the Difference
Mean Difference: μ d i f f = μ A − μ B = 205 − 180 = 25 ms \mu_{diff} = \mu_A - \mu_B = 205 - 180 = 25 \text{ ms} μ d i f f = μ A − μ B = 205 − 180 = 25 ms .
Variance of Difference:
Var ( X ˉ A − X ˉ B ) = σ A 2 n A + σ B 2 n B = 50 2 100 + 300 100 = 2500 100 + 3 = 25 + 3 = 28 \text{Var}(\bar{X}_A - \bar{X}_B) = \frac{\sigma_A^2}{n_A} + \frac{\sigma_B^2}{n_B} = \frac{50^2}{100} + \frac{300}{100} = \frac{2500}{100} + 3 = 25 + 3 = 28 Var ( X ˉ A − X ˉ B ) = n A σ A 2 + n B σ B 2 = 100 5 0 2 + 100 300 = 100 2500 + 3 = 25 + 3 = 28
Standard Error = 28 ≈ 5.29 \sqrt{28} \approx 5.29 28 ≈ 5.29 ms.
Step 3: Calculate Probability
We want P ( D > 20 ) P(D > 20) P ( D > 20 ) where D ∼ N ( 25 , 28 ) D \sim N(25, 28) D ∼ N ( 25 , 28 ) .
Z = 20 − 25 5.29 = − 5 5.29 ≈ − 0.945 Z = \frac{20 - 25}{5.29} = \frac{-5}{5.29} \approx -0.945 Z = 5.29 20 − 25 = 5.29 − 5 ≈ − 0.945
P ( Z > − 0.945 ) = P ( Z < 0.945 ) ≈ 0.8277 P(Z > -0.945) = P(Z < 0.945) \approx 0.8277 P ( Z > − 0.945 ) = P ( Z < 0.945 ) ≈ 0.8277
Insight: Despite A being skewed and B being uniform, we can easily calculate probabilities about their difference using the Normal distribution!
Exercise 1: 6691/01/June11/1
Explain what you understand by the Central Limit Theorem. (3)
A six-sided die is changed so that there are three faces marked 1, two faces marked 3 and one face marked 6.
The die is rolled 40 times and the mean of the 40 scores is recorded.
a. Find an approximate distribution for the mean of the scores. (3)
b. Use your approximation to estimate the probability that the mean is greater than 3. (4)
Exercise 2: 6691/01R/June13/6
The continuous random variable X X X is uniformly distributed over the interval
[ a − 1 , a + 5 ] , [a - 1, a + 5], [ a − 1 , a + 5 ] ,
where a a a is a constant.
Fifty observations of X X X are taken, giving a sample mean of 17.2 17.2 17.2 .
a. Use the Central Limit Theorem to find an approximate distribution for X ˉ \bar{X} X ˉ . (3)
b. Hence find a 95% confidence interval for a a a . (4)
Exercise 3: 6691/01/May14/7
A machine fills packets with X X X grams of powder where X X X is normally distributed with mean μ \mu μ .
Each packet is supposed to contain 1 kg of powder.
To comply with regulations, the weight of powder in a randomly selected packet should be such that
P ( X < μ − 30 ) = 0.0005. P(X < \mu - 30) = 0.0005. P ( X < μ − 30 ) = 0.0005.
a. Show that this requires the standard deviation to be 9.117 9.117 9.117 g, correct to 3 decimal places. (3)
A random sample of 10 packets is selected from the machine.
The weight, in grams, of powder in each packet is as follows:
999.8 , 991.6 , 1000.3 , 1006.1 , 1008.2 , 997.0 , 993.2 , 1000.0 , 997.1 , 1002.1. 999.8,\; 991.6,\; 1000.3,\; 1006.1,\; 1008.2,\; 997.0,\; 993.2,\; 1000.0,\; 997.1,\; 1002.1. 999.8 , 991.6 , 1000.3 , 1006.1 , 1008.2 , 997.0 , 993.2 , 1000.0 , 997.1 , 1002.1.
b. Assuming that the standard deviation of the population is 9.117 9.117 9.117 g, test, at the 1% significance level, whether or not the machine is delivering packets with mean weight of less than 1 kg. State your hypotheses clearly. (7)
Exercise 4: WST03/01/May16/8
A six-sided die is labelled with the numbers 1, 2, 3, 4, 5 and 6.
A group of 50 students want to test whether or not the die is fair for the number six.
The 50 students each roll the die 30 times and record the number of sixes they each obtain.
Let X ˉ \bar{X} X ˉ denote the mean number of sixes obtained by the 50 students.
We wish to test
H 0 : p = 1 6 against H 1 : p ≠ 1 6 , H_0 : p = \frac{1}{6} \quad\text{against}\quad H_1 : p \ne \frac{1}{6}, H 0 : p = 6 1 against H 1 : p = 6 1 ,
where p p p is the probability of rolling a 6.
a. Use the Central Limit Theorem to find an approximate distribution for X ˉ \bar{X} X ˉ , if H 0 H_0 H 0 is true. (3)
b. Hence find, in terms of X ˉ \bar{X} X ˉ , the critical region for this test. Use a 5% level of significance. (4)
Exercise 5: 6691/01/June13/6
Fruit-n-Veg4U Market Gardens grow tomatoes.
They want to improve their yield of tomatoes by at least 1 kg per plant by buying a new variety.
The variance of the yield of the old variety of plant is 0.5 kg 2 0.5~\text{kg}^2 0.5 kg 2 and the variance of the yield for the new variety is 0.75 kg 2 0.75~\text{kg}^2 0.75 kg 2 .
A random sample of 60 plants of the old variety has a mean yield of 5.5 5.5 5.5 kg.
A random sample of 70 plants of the new variety has a mean yield of 7 7 7 kg.
a. Stating your hypotheses clearly, test, at the 5% level of significance, whether or not there is evidence that the mean yield of the new variety is more than 1 kg greater than the mean yield of the old variety. (9)
b. Explain the relevance of the Central Limit Theorem to the test in part (a). (2)
Exercise 6: WST03/01/May14/3
A grocer believes that the average weight of a grapefruit from farm A is greater than the average weight of a grapefruit from farm B.
The weights, in grams, of 80 grapefruit selected at random from farm A have a mean value of 532 g and a standard deviation s A s_A s A of 35 g.
A random sample of 100 grapefruit from farm B has a mean weight of 520 g and a standard deviation s B s_B s B of 28 g.
Stating your hypotheses clearly and using a 1% level of significance, test whether or not the grocer’s belief is supported by the data. (7)
Exercise 7: 6691/01R/June13/7
A farmer monitored the amount of lead in soil in a field next to a factory.
He took 100 samples of soil, randomly selected from different parts of the field,
and found the mean weight of lead to be 67 mg/kg with standard deviation 25 mg/kg.
After the factory closed, the farmer took 150 samples of soil,
randomly selected from different parts of the field,
and found the mean weight of lead to be 60 mg/kg with standard deviation 10 mg/kg.
a. Test, at the 5% level of significance, whether or not the mean weight of lead in the soil decreased after the factory closed. State your hypotheses clearly. (7)
b. Explain the significance of the Central Limit Theorem to the test in part (a). (1)
c. State an assumption you have made to carry out this test. (1)
Exercise 8: 6691/01R/May14/5
A student believes that there is a difference in the mean lengths of English and French films.
He goes to the university video library and randomly selects a sample of 120 English films and a sample of 70 French films.
He notes the length, x x x minutes, of each of the films in his samples.
His data are summarised in the table below.
∑ x \sum x ∑ x ∑ x 2 \sum x^2 ∑ x 2 s 2 s^2 s 2 n n n English films 10650 956909 98.5 120 French films 6510 615849 151 70
a. Verify that the unbiased estimate of the variance, s 2 s^2 s 2 , of the lengths of English films is 98.5 minutes 2 98.5~\text{minutes}^2 98.5 minutes 2 . (2)
b. Stating your hypotheses clearly, test, at the 1% level of significance, whether or not the mean lengths of English and French films are different. (7)
c. Explain the significance of the Central Limit Theorem to the test in part (b). (1)
d. The university video library contained 724 English films and 473 French films. Explain how the student could have taken a stratified sample of 190 of these films. (3)