In Chapter 1, we learned how to collect a representative sample from a population. We carefully selected 200 students from HelloTea’s 3,000-student customer base using proper sampling methods. Now we face the next crucial question:
What do we do with this data?
After conducting our survey, we calculated the sample mean satisfaction score: X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 (on a scale of 0 to 5). This is our best guess for the true mean satisfaction μ \mu μ of all 3,000 students. But how reliable is this estimate?
The Central Challenge
Our observation: Sample mean X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 from 200 students
Our question: How close is this to the true mean μ \mu μ of all 3,000 students?
The fundamental problem: We can never know μ \mu μ exactly (unless we survey everyone), but we can quantify how reliable our estimate is.
To answer these questions rigorously, we need a mathematical framework. This chapter develops the theory of statistical estimation and shows how to quantify uncertainty using confidence intervals .
Before we proceed, we must be explicit about the assumptions underlying our methods.
Fundamental Assumptions
1. (Normality): Each student’s satisfaction score is drawn from a normal distribution:
X i ∼ N ( μ , σ 2 ) for i = 1 , 2 , … , 200 X_i \sim N(\mu, \sigma^2) \text{ for } i = 1, 2, \ldots, 200 X i ∼ N ( μ , σ 2 ) for i = 1 , 2 , … , 200
2. (Independence): Each student’s rating is independent of all others.
3. (Random Sampling): Our sample was selected using proper random sampling methods (as learned in Chapter 1).
Note: These are strong assumptions. We’ll revisit their realism at the end of this chapter.
Learning Objectives
Identify parameters and choose natural estimators in context.
Distinguish estimator (θ ^ \hat{\theta} θ ^ ) from observed estimate (θ ^ o b s \hat{\theta}_{obs} θ ^ o b s ).
Explain why an estimator must be evaluated mathematically, not just intuitively.
We used the sample mean X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 to estimate the population mean μ \mu μ . This seems natural - after all, the average of our sample should tell us something about the average of the population. But how do we know this is a good estimate?
The Fundamental Question
Given that we observed X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 , how can we judge the quality of this estimate?
What does “good estimate” even mean mathematically?
To answer this, we need to formalize what we mean by an “estimator.”
Real-World Scenario Before Definition
A food delivery platform wants to know the average delivery time across all orders this month. It only audits 500 orders.
The unknown target is the true monthly average delivery time for all orders.
A practical rule is to use the sample average from the 500 audited orders.
The business decision (staffing and compensation policy) depends on whether this rule is trustworthy.
This is exactly the estimator idea: use a rule based on sample data to infer an unknown population quantity.
Definition: Estimator
An estimator is a statistic (a function of sample data) used to estimate an unknown population parameter.
A point estimator produces a single numerical value as an estimate.
Notation: We typically use θ ^ \hat{\theta} θ ^ to denote an estimator of the parameter θ \theta θ .
Example: HelloTea Estimators
In our satisfaction survey:
Population parameter: μ \mu μ = true mean satisfaction of all 3,000 students (unknown)
Estimator: X ˉ \bar{X} X ˉ = sample mean = 1 n ∑ i = 1 n X i \frac{1}{n}\sum_{i=1}^{n} X_i n 1 ∑ i = 1 n X i
Estimate: x ˉ = 4.2 \bar{x} = 4.2 x ˉ = 4.2 (the specific value we observed)
Note: X ˉ \bar{X} X ˉ is a random variable (an estimator), while x ˉ = 4.2 \bar{x} = 4.2 x ˉ = 4.2 is the specific number we calculated.
Now that we have an estimator X ˉ \bar{X} X ˉ for μ \mu μ , we face two critical questions:
Two Key Questions About Estimators
Question 1 (Systematic Error): Is our estimator systematically wrong? Does it tend to overestimate or underestimate the true parameter?
⇒ \Rightarrow ⇒ This leads to the concept of bias (Section 2)
Question 2 (Variability): How much does our estimator vary from sample to sample? If we repeated the survey with a different random sample of 200 students, how different would X ˉ \bar{X} X ˉ be?
⇒ \Rightarrow ⇒ This leads to the concept of standard error (Section 3)
Learning Objectives
Compute and interpret bias as long-run systematic error.
Prove whether an estimator is unbiased using expectation rules.
Explain why the n − 1 n-1 n − 1 correction appears in sample variance.
The first question is whether our estimator is systematically biased - does it tend to be too high or too low?
Real-World Scenario Before Bias
A wearable device estimates daily calories burned. Suppose the algorithm is consistently lower than lab measurements by about 80 kcal, even across many users.
Individual daily errors vary up and down.
But the average error is negative over repeated users and repeated days.
This is a bias problem: the method is systematically off-center, not just noisy.
Definition: Bias
The bias of an estimator θ ^ \hat{\theta} θ ^ for a parameter θ \theta θ is defined as:
Bias ( θ ^ ) = E [ θ ^ ] − θ \text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta Bias ( θ ^ ) = E [ θ ^ ] − θ
An estimator is unbiased if E [ θ ^ ] = θ E[\hat{\theta}] = \theta E [ θ ^ ] = θ , i.e., if Bias ( θ ^ ) = 0 \text{Bias}(\hat{\theta}) = 0 Bias ( θ ^ ) = 0 .
Interpretation: An unbiased estimator is “correct on average” - if we repeated the sampling process infinitely many times, the average of all our estimates would equal the true parameter.
Let’s check whether X ˉ \bar{X} X ˉ is an unbiased estimator of μ \mu μ .
Question: Is E [ X ˉ ] = μ E[\bar{X}] = \mu E [ X ˉ ] = μ ?
To answer this, we need to calculate E [ X ˉ ] E[\bar{X}] E [ X ˉ ] . The linearity rules were covered in the Combinations of Random Variables handout, so here we focus on how they are used in estimation.
Now we can check if X ˉ \bar{X} X ˉ is unbiased:
Example: Sample Mean is Unbiased
Calculate:
E [ X ˉ ] = E [ 1 200 ∑ i = 1 200 X i ] = 1 200 E [ ∑ i = 1 200 X i ] = 1 200 ∑ i = 1 200 E [ X i ] = 1 200 ∑ i = 1 200 μ = μ \begin{aligned}
E[\bar{X}] &= E\left[\frac{1}{200}\sum_{i=1}^{200} X_i\right] = \frac{1}{200} E\left[\sum_{i=1}^{200} X_i\right] = \frac{1}{200} \sum_{i=1}^{200} E[X_i] = \frac{1}{200} \sum_{i=1}^{200} \mu = \mu
\end{aligned} E [ X ˉ ] = E [ 200 1 i = 1 ∑ 200 X i ] = 200 1 E [ i = 1 ∑ 200 X i ] = 200 1 i = 1 ∑ 200 E [ X i ] = 200 1 i = 1 ∑ 200 μ = μ
Conclusion: E [ X ˉ ] = μ E[\bar{X}] = \mu E [ X ˉ ] = μ , so the sample mean is an unbiased estimator of the population mean!
Interpretation: On average (over many repeated samples), X ˉ \bar{X} X ˉ equals the true mean μ \mu μ . Our specific observation x ˉ = 4.2 \bar{x} = 4.2 x ˉ = 4.2 might be above or below μ \mu μ , but the method is not systematically biased.
Not all natural estimators are unbiased. Let’s examine the sample variance.
Example: The Biased Sample Variance
Suppose we want to estimate the population variance σ 2 \sigma^2 σ 2 . The natural estimator might seem to be:
Natural estimator: 1 n ∑ i = 1 n ( X i − X ˉ ) 2 \text{Natural estimator: } \frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2 Natural estimator: n 1 ∑ i = 1 n ( X i − X ˉ ) 2
This is just the average squared deviation from the sample mean. Is it unbiased?
Let’s calculate (for simplicity, we’ll show the key idea):
It can be proven that (please refer to the challenge exercise for details):
E [ 1 n ∑ i = 1 n ( X i − X ˉ ) 2 ] = n − 1 n σ 2 E\left[\frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2\right] = \frac{n-1}{n}\sigma^2 E [ n 1 ∑ i = 1 n ( X i − X ˉ ) 2 ] = n n − 1 σ 2
Discovery: This is NOT equal to σ 2 \sigma^2 σ 2 ! The bias is:
Bias = n − 1 n σ 2 − σ 2 = − 1 n σ 2 < 0 \text{Bias} = \frac{n-1}{n}\sigma^2 - \sigma^2 = -\frac{1}{n}\sigma^2 < 0 Bias = n n − 1 σ 2 − σ 2 = − n 1 σ 2 < 0
The natural estimator systematically underestimates the true variance.
Why?
When we use X ˉ \bar{X} X ˉ (which is calculated from the same data), we’re making the deviations artificially small. The sample mean X ˉ \bar{X} X ˉ minimizes ∑ ( X i − X ˉ ) 2 \sum(X_i - \bar{X})^2 ∑ ( X i − X ˉ ) 2 , so using it makes the variance appear smaller than it truly is.
To fix this bias, we use the corrected formula:
Definition: Unbiased Sample Variance
The sample variance (unbiased estimator of σ 2 \sigma^2 σ 2 ) is:
S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2 S 2 = n − 1 1 ∑ i = 1 n ( X i − X ˉ ) 2
This correction factor ( n − 1 ) (n-1) ( n − 1 ) is called the degrees of freedom .
Verification:
E [ S 2 ] = E [ 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 ] = n n − 1 ⋅ n − 1 n σ 2 = σ 2 ✓ E[S^2] = E\left[\frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2\right] = \frac{n}{n-1} \cdot \frac{n-1}{n}\sigma^2 = \sigma^2 \checkmark E [ S 2 ] = E [ n − 1 1 ∑ i = 1 n ( X i − X ˉ ) 2 ] = n − 1 n ⋅ n n − 1 σ 2 = σ 2 ✓
Example: HelloTea: Using the Correct Formula
In our satisfaction survey with n = 200 n = 200 n = 200 students:
If we calculated ∑ i = 1 200 ( X i − 4.2 ) 2 = 288 \sum_{i=1}^{200}(X_i - 4.2)^2 = 288 ∑ i = 1 200 ( X i − 4.2 ) 2 = 288
Wrong (biased): Variance ≈ 288 200 = 1.44 \approx \frac{288}{200} = 1.44 ≈ 200 288 = 1.44
Correct (unbiased): S 2 = 288 199 ≈ 1.447 S^2 = \frac{288}{199} \approx 1.447 S 2 = 199 288 ≈ 1.447
The difference is small here, but the principle matters: always use n − 1 n-1 n − 1 for sample variance!
Exercise 1: Understanding Bias in Estimation
A quality control team tests battery life of mobile phones. For a random sample X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X 1 , X 2 , … , X n from a population with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 , the team considers two estimators for the population variance:
Estimator 1: V 1 = 1 n ∑ i = 1 n ( X i − X ˉ ) 2 V_1 = \frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2 V 1 = n 1 ∑ i = 1 n ( X i − X ˉ ) 2
Estimator 2: V 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 V_2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2 V 2 = n − 1 1 ∑ i = 1 n ( X i − X ˉ ) 2
a. Show that X ˉ = 1 n ( X 1 + X 2 + ⋯ + X n ) \bar{X} = \frac{1}{n}(X_1 + X_2 + \cdots + X_n) X ˉ = n 1 ( X 1 + X 2 + ⋯ + X n ) is an unbiased estimator of the population mean μ \mu μ .
b. Given that V 2 V_2 V 2 is an unbiased estimator of σ 2 \sigma^2 σ 2 , find the bias of V 1 V_1 V 1 when used as an estimator of σ 2 \sigma^2 σ 2 . Express your answer in terms of n n n and σ 2 \sigma^2 σ 2 .
c. Five batteries were taken at random and tested. The lifetimes, in hours, were as follows:
435 390 356 388 449 435 \quad 390 \quad 356 \quad 388 \quad 449 435 390 356 388 449
Calculate the unbiased estimate of μ \mu μ and σ 2 \sigma^2 σ 2 .
d. A researcher proposes an alternative estimator for the population mean μ \mu μ :
T = X 1 + 2 X 3 + X 5 4 T = \frac{X_1 + 2X_3 + X_5}{4} T = 4 X 1 + 2 X 3 + X 5
i. Calculate E [ T ] E[T] E [ T ] in terms of μ \mu μ . Is T T T an unbiased estimator of μ \mu μ ? Justify your answer.
ii. Find the bias of T T T as an estimator of μ \mu μ .
iii. Explain why the standard estimator X ˉ = 1 n ( X 1 + X 2 + ⋯ + X n ) \bar{X} = \frac{1}{n}(X_1 + X_2 + \cdots + X_n) X ˉ = n 1 ( X 1 + X 2 + ⋯ + X n ) is preferred over T T T .
Learning Objectives
Explain why unbiasedness alone is insufficient for good estimation.
Derive SE ( X ˉ ) = σ / n \text{SE}(\bar{X}) = \sigma/\sqrt{n} SE ( X ˉ ) = σ / n from variance rules.
Use SE to compare precision across different sample sizes.
We’ve established that X ˉ \bar{X} X ˉ is unbiased (E [ X ˉ ] = μ E[\bar{X}] = \mu E [ X ˉ ] = μ ), which is good news! But being unbiased doesn’t tell the whole story. Consider this thought experiment:
Thought Experiment
Imagine we surveyed a different random sample of 200 students from HelloTea’s customer base. Would we get exactly X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 again?
Almost certainly not! We’d get a different value, maybe X ˉ = 4.15 \bar{X} = 4.15 X ˉ = 4.15 or X ˉ = 4.28 \bar{X} = 4.28 X ˉ = 4.28 .
The key question: How much does X ˉ \bar{X} X ˉ typically vary from sample to sample?
This variability is captured by the standard error .
Real-World Scenario Before Standard Error
A marketing team runs the same online ad campaign in many weeks. Each week they report the mean order value from a random sample of customers.
The estimating method can be unbiased on average.
Yet weekly sample means still move up and down due to random sampling.
Standard error quantifies this week-to-week fluctuation of the estimator.
Definition: Standard Error
The standard error of an estimator θ ^ \hat{\theta} θ ^ is the standard deviation of its sampling distribution:
SE ( θ ^ ) = Var ( θ ^ ) \text{SE}(\hat{\theta}) = \sqrt{\text{Var}(\hat{\theta})} SE ( θ ^ ) = Var ( θ ^ )
Interpretation: A smaller standard error means a more precise (less variable) estimator.
For the sample mean under independent sampling:
Var ( X ˉ ) = σ 2 n ⇒ SE ( X ˉ ) = σ n . \text{Var}(\bar{X})=\frac{\sigma^2}{n}
\quad\Rightarrow\quad
\boxed{\text{SE}(\bar{X})=\frac{\sigma}{\sqrt{n}}}. Var ( X ˉ ) = n σ 2 ⇒ SE ( X ˉ ) = n σ .
The formula SE ( X ˉ ) = σ n \text{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}} SE ( X ˉ ) = n σ reveals important insights:
Insights from the SE Formula
1. Effect of population variability (σ \sigma σ ):
Larger σ \sigma σ (more variable population) ⇒ \Rightarrow ⇒ larger SE (less precise estimate)
Smaller σ \sigma σ (more homogeneous population) ⇒ \Rightarrow ⇒ smaller SE (more precise estimate)
2. Effect of sample size (n n n ):
Larger n n n (more data) ⇒ \Rightarrow ⇒ smaller SE (more precise estimate)
The improvement is proportional to n \sqrt{n} n , not n n n
3. The n \sqrt{n} n relationship:
To cut SE in half, you need 4 times as much data
To cut SE to 1/10, you need 100 times as much data
Diminishing returns: doubling sample size doesn’t double precision
Let’s apply this to our satisfaction survey:
Example: HelloTea Standard Error
Scenario: From historical data or a pilot study, suppose we know that satisfaction ratings have standard deviation σ = 1.2 \sigma = 1.2 σ = 1.2 points (on the 0-5 scale).
Our sample: n = 200 n = 200 n = 200 students, X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2
Calculate SE:
SE ( X ˉ ) = σ n = 1.2 200 = 1.2 14.142 ≈ 0.0849 \text{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}} = \frac{1.2}{\sqrt{200}} = \frac{1.2}{14.142} \approx 0.0849 SE ( X ˉ ) = n σ = 200 1.2 = 14.142 1.2 ≈ 0.0849
Interpretation: The sample mean X ˉ \bar{X} X ˉ typically varies by about ± 0.085 \pm 0.085 ± 0.085 points from the true mean μ \mu μ .
If we repeated the survey many times with different random samples of 200 students, about 68% of the sample means would fall within μ ± 0.085 \mu \pm 0.085 μ ± 0.085 .
In practice, we usually don’t know the population standard deviation σ \sigma σ . What do we do?
Estimated Standard Error
When σ \sigma σ is unknown (which is almost always the case), we estimate it using the sample standard deviation S S S :
Estimated SE ( X ˉ ) = S n \text{Estimated SE}(\bar{X}) = \frac{S}{\sqrt{n}} Estimated SE ( X ˉ ) = n S
where S = S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 S = \sqrt{S^2} = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2} S = S 2 = n − 1 1 ∑ i = 1 n ( X i − X ˉ ) 2
HelloTea Example: If our sample yields S = 1.2 S = 1.2 S = 1.2 , then:
Estimated SE ( X ˉ ) = 1.2 200 ≈ 0.085 \text{Estimated SE}(\bar{X}) = \frac{1.2}{\sqrt{200}} \approx 0.085 Estimated SE ( X ˉ ) = 200 1.2 ≈ 0.085
Note: For large samples (n ≥ 30 n \geq 30 n ≥ 30 ), the difference between using σ \sigma σ and S S S is negligible.
Learning Objectives
Connect sampling distributions to interval construction.
Derive and apply the 100 ( 1 − α ) % 100(1-\alpha)\% 100 ( 1 − α ) % confidence interval for μ \mu μ .
Interpret interval width using sample size, variability, and confidence level.
So far, we’ve learned:
X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 is an unbiased estimate of μ \mu μ (no systematic error)
SE ( X ˉ ) ≈ 0.085 \text{SE}(\bar{X}) \approx 0.085 SE ( X ˉ ) ≈ 0.085 tells us the typical variability
But when we report to HelloTea’s manager, saying “the mean satisfaction is 4.2 with SE of 0.085” isn’t very intuitive. A better approach is to give a range of plausible values for μ \mu μ .
The Limitation of Point Estimates
Point estimate: X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 gives a single number, but…
We know this is almost certainly not exactly equal to μ \mu μ
It doesn’t convey our uncertainty
It’s hard to use for decision-making
Better approach: Give a range like ”μ \mu μ is likely between 4.0 and 4.4”
This is what confidence intervals provide!
Real-World Scenario Before Confidence Intervals
A hospital reports average emergency waiting time from a sample of patient records.
Reporting a single value (for example, 47 minutes) can be misleadingly precise.
Operations managers need a plausible range to plan staffing safely.
A confidence interval gives a principled uncertainty band for the true average waiting time.
To construct meaningful intervals, we need to understand the distribution of X ˉ \bar{X} X ˉ - not just its mean and variance, but its entire probability distribution. This is where our normality assumption becomes crucial!
Why We Need the Sampling Distribution
Question: How do we know how far X ˉ \bar{X} X ˉ typically deviates from μ \mu μ ?
Answer: We need to know the probability distribution of X ˉ \bar{X} X ˉ itself!
This is called the sampling distribution of X ˉ \bar{X} X ˉ - the distribution that describes how X ˉ \bar{X} X ˉ varies across different possible samples.
For confidence intervals, we use the key result:
X ˉ ∼ N ( μ , σ 2 n ) \bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) X ˉ ∼ N ( μ , n σ 2 )
under independent normal sampling.
Link to the prerequisite handout
Why linear combinations of normal variables are normal is covered in detail in the Combinations of Random Variables handout. Here, we focus on how this result is used to construct and interpret confidence intervals.
Now we can build our confidence interval by standardizing X ˉ \bar{X} X ˉ :
Example: Deriving the 95% Confidence Interval
Start with what we know:
Z = X ˉ − μ σ / n ∼ N ( 0 , 1 ) Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0,1) Z = σ / n X ˉ − μ ∼ N ( 0 , 1 )
Use the 95% probability:
P ( − 1.96 < Z < 1.96 ) = 0.95 P(-1.96 < Z < 1.96) = 0.95 P ( − 1.96 < Z < 1.96 ) = 0.95
Substitute the formula for Z Z Z :
P ( − 1.96 < X ˉ − μ σ / n < 1.96 ) = 0.95 P\left(-1.96 < \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} < 1.96\right) = 0.95 P ( − 1.96 < σ / n X ˉ − μ < 1.96 ) = 0.95
Multiply all parts by σ / n \sigma/\sqrt{n} σ / n :
P ( − 1.96 ⋅ σ n < X ˉ − μ < 1.96 ⋅ σ n ) = 0.95 P\left(-1.96 \cdot \frac{\sigma}{\sqrt{n}} < \bar{X} - \mu < 1.96 \cdot \frac{\sigma}{\sqrt{n}}\right) = 0.95 P ( − 1.96 ⋅ n σ < X ˉ − μ < 1.96 ⋅ n σ ) = 0.95
Rearrange to isolate μ \mu μ in the middle:
P ( X ˉ − 1.96 ⋅ σ n < μ < X ˉ + 1.96 ⋅ σ n ) = 0.95 P\left(\bar{X} - 1.96 \cdot \frac{\sigma}{\sqrt{n}} < \mu < \bar{X} + 1.96 \cdot \frac{\sigma}{\sqrt{n}}\right) = 0.95 P ( X ˉ − 1.96 ⋅ n σ < μ < X ˉ + 1.96 ⋅ n σ ) = 0.95
This can be written compactly as:
P ( X ˉ − 1.96 ⋅ SE ( X ˉ ) < μ < X ˉ + 1.96 ⋅ SE ( X ˉ ) ) = 0.95 P\left(\bar{X} - 1.96 \cdot \text{SE}(\bar{X}) < \mu < \bar{X} + 1.96 \cdot \text{SE}(\bar{X})\right) = 0.95 P ( X ˉ − 1.96 ⋅ SE ( X ˉ ) < μ < X ˉ + 1.96 ⋅ SE ( X ˉ ) ) = 0.95
Or more simply: μ ∈ [ X ˉ ± 1.96 ⋅ SE ( X ˉ ) ] \mu \in \left[\bar{X} \pm 1.96 \cdot \text{SE}(\bar{X})\right] μ ∈ [ X ˉ ± 1.96 ⋅ SE ( X ˉ ) ] with probability 0.95
Definition: Confidence Interval for the Mean
A 100 ( 1 − α ) % 100(1-\alpha)\% 100 ( 1 − α ) % confidence interval for the population mean μ \mu μ is:
X ˉ ± z ∗ × SE ( X ˉ ) = X ˉ ± z ∗ × σ n \boxed{\bar{X} \pm z^* \times \text{SE}(\bar{X}) = \bar{X} \pm z^* \times \frac{\sigma}{\sqrt{n}}} X ˉ ± z ∗ × SE ( X ˉ ) = X ˉ ± z ∗ × n σ
where z ∗ z^* z ∗ is chosen so that P ( − z ∗ < Z < z ∗ ) = 1 − α P(-z^* < Z < z^*) = 1 - \alpha P ( − z ∗ < Z < z ∗ ) = 1 − α for Z ∼ N ( 0 , 1 ) Z \sim N(0,1) Z ∼ N ( 0 , 1 ) .
Common confidence levels:
Confidence Level α \alpha α z ∗ z^* z ∗ value90% 0.10 1.645 95% 0.05 1.96 99% 0.01 2.576
When σ \sigma σ is unknown: Replace σ \sigma σ with sample standard deviation S S S :
X ˉ ± z ∗ × S n \bar{X} \pm z^* \times \frac{S}{\sqrt{n}} X ˉ ± z ∗ × n S
Let’s put everything together for our satisfaction survey:
Example: HelloTea 95% Confidence Interval
Given data:
Sample size: n = 200 n = 200 n = 200 students
Sample mean: X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 points
Sample standard deviation: S = 1.2 S = 1.2 S = 1.2 points
Confidence level: 95% (so z ∗ = 1.96 z^* = 1.96 z ∗ = 1.96 )
Step 1: Calculate the standard error
SE ( X ˉ ) = S n = 1.2 200 = 1.2 14.142 = 0.0849 ≈ 0.085 \text{SE}(\bar{X}) = \frac{S}{\sqrt{n}} = \frac{1.2}{\sqrt{200}} = \frac{1.2}{14.142} = 0.0849 \approx 0.085 SE ( X ˉ ) = n S = 200 1.2 = 14.142 1.2 = 0.0849 ≈ 0.085
Step 2: Calculate the margin of error
Margin of Error = z ∗ × SE ( X ˉ ) = 1.96 × 0.085 = 0.1666 ≈ 0.167 \text{Margin of Error} = z^* \times \text{SE}(\bar{X}) = 1.96 \times 0.085 = 0.1666 \approx 0.167 Margin of Error = z ∗ × SE ( X ˉ ) = 1.96 × 0.085 = 0.1666 ≈ 0.167
Step 3: Construct the interval
95% CI = X ˉ ± Margin of Error = 4.2 ± 0.167 = [ 4.033 , 4.367 ] \begin{aligned}
\text{95\% CI} &= \bar{X} \pm \text{Margin of Error}\\
&= 4.2 \pm 0.167\\
&= [4.033, 4.367]
\end{aligned} 95% CI = X ˉ ± Margin of Error = 4.2 ± 0.167 = [ 4.033 , 4.367 ]
Report: “We are 95% confident that the true mean satisfaction score for all 3,000 students lies between 4.03 and 4.37 points.”
The width of a confidence interval tells us about the precision of our estimate. A narrower interval means we’ve pinned down μ \mu μ more precisely.
What Affects CI Width?
The width of a confidence interval is 2 × z ∗ × σ n 2 \times z^* \times \frac{\sigma}{\sqrt{n}} 2 × z ∗ × n σ
Factor 1: Sample size (n n n )
Larger n n n ⇒ \Rightarrow ⇒ narrower CI (more precise)
Width decreases as 1 / n 1/\sqrt{n} 1/ n
To halve the width, need 4 times the sample size
Factor 2: Population variability (σ \sigma σ )
Larger σ \sigma σ (more variable population) ⇒ \Rightarrow ⇒ wider CI (less precise)
Can’t control this - it’s a property of the population
More homogeneous populations give more precise estimates
Factor 3: Confidence level
Higher confidence level ⇒ \Rightarrow ⇒ wider CI
90% CI: z ∗ = 1.645 z^* = 1.645 z ∗ = 1.645 (narrower); 95% CI: z ∗ = 1.96 z^* = 1.96 z ∗ = 1.96 (moderate); 99% CI: z ∗ = 2.576 z^* = 2.576 z ∗ = 2.576 (wider)
Trade-off: more confidence requires a wider net
Example: HelloTea: Comparing Different Confidence Levels
With X ˉ = 4.2 \bar{X} = 4.2 X ˉ = 4.2 , S = 1.2 S = 1.2 S = 1.2 , n = 200 n = 200 n = 200 , so SE = 0.085 \text{SE} = 0.085 SE = 0.085 :
Confidence Level Calculation Interval 90% 4.2 ± 1.645 ( 0.085 ) 4.2 \pm 1.645(0.085) 4.2 ± 1.645 ( 0.085 ) [4.06, 4.34] 95% 4.2 ± 1.96 ( 0.085 ) 4.2 \pm 1.96(0.085) 4.2 ± 1.96 ( 0.085 ) [4.03, 4.37] 99% 4.2 ± 2.576 ( 0.085 ) 4.2 \pm 2.576(0.085) 4.2 ± 2.576 ( 0.085 ) [3.98, 4.42]
Observation: More confidence means casting a wider net. We’re more confident the interval contains μ \mu μ , but the interval tells us less precisely where μ \mu μ is.
Learning Objectives
Consolidate the full estimation workflow from point estimate to interval estimate.
Critically evaluate assumptions behind normal-based inference.
Prepare for the Central Limit Theorem as a robustness tool.
This chapter developed the mathematical framework for statistical estimation:
Key Concepts Summary
Estimators: Statistics used to estimate unknown population parameters
Example: X ˉ \bar{X} X ˉ estimates μ \mu μ
Bias: Measures systematic error
Formula: Bias ( θ ^ ) = E [ θ ^ ] − θ \text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta Bias ( θ ^ ) = E [ θ ^ ] − θ
Unbiased: E [ θ ^ ] = θ E[\hat{\theta}] = \theta E [ θ ^ ] = θ
Result: X ˉ \bar{X} X ˉ is unbiased for μ \mu μ
Standard Error: Measures precision/variability
Formula: SE ( θ ^ ) = Var ( θ ^ ) \text{SE}(\hat{\theta}) = \sqrt{\text{Var}(\hat{\theta})} SE ( θ ^ ) = Var ( θ ^ )
For X ˉ \bar{X} X ˉ : SE ( X ˉ ) = σ / n \text{SE}(\bar{X}) = \sigma/\sqrt{n} SE ( X ˉ ) = σ / n
Smaller SE = more precise estimate
Confidence Intervals: Quantify uncertainty
Formula: X ˉ ± z ∗ × SE ( X ˉ ) \bar{X} \pm z^* \times \text{SE}(\bar{X}) X ˉ ± z ∗ × SE ( X ˉ )
Interpretation: Method captures μ \mu μ with specified probability
Practical tool for reporting uncertainty
Concept Formula HelloTea Value Sample Mean X ˉ = 1 n ∑ i = 1 n X i \bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i X ˉ = n 1 ∑ i = 1 n X i 4.2 Expectation of X ˉ \bar{X} X ˉ E [ X ˉ ] = μ E[\bar{X}] = \mu E [ X ˉ ] = μ μ \mu μ (unbiased)Variance of X ˉ \bar{X} X ˉ Var ( X ˉ ) = σ 2 n \text{Var}(\bar{X}) = \frac{\sigma^2}{n} Var ( X ˉ ) = n σ 2 1.44 200 \frac{1.44}{200} 200 1.44 Standard Error SE ( X ˉ ) = σ n \text{SE}(\bar{X}) = \frac{\sigma}{\sqrt{n}} SE ( X ˉ ) = n σ 0.085 95% CI X ˉ ± 1.96 × SE ( X ˉ ) \bar{X} \pm 1.96 \times \text{SE}(\bar{X}) X ˉ ± 1.96 × SE ( X ˉ ) [4.03, 4.37] Sample Variance S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ˉ ) 2 S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2 S 2 = n − 1 1 ∑ i = 1 n ( X i − X ˉ ) 2 1.44
Throughout this chapter, we relied on several key assumptions. Let’s critically evaluate them:
Assumption 1: Normality
We assumed: X i ∼ N ( μ , σ 2 ) X_i \sim N(\mu, \sigma^2) X i ∼ N ( μ , σ 2 ) - satisfaction scores are normally distributed
Reality check:
Satisfaction ratings are discrete (1, 2, 3, 4, 5), not continuous
Normal distribution is continuous, ranging from − ∞ -\infty − ∞ to + ∞ +\infty + ∞
May have ceiling effects (many students give 5/5) or floor effects
Distribution might be skewed (not symmetric)
Rating behavior might not follow a bell curve at all
Assumption 2: Independence
We assumed: Each student’s rating is independent of others
Reality check:
Students might influence each other (“My friend said it’s great!”)
If we sampled students who visited together, ratings might be correlated
Social media posts could create clustering of opinions
The good news: Random sampling helps ensure independence. If we truly randomly selected students from across the population, this assumption is reasonable.
A Fundamental Problem
We derived that: If X i ∼ N ( μ , σ 2 ) X_i \sim N(\mu, \sigma^2) X i ∼ N ( μ , σ 2 ) , then X ˉ ∼ N ( μ , σ 2 / n ) \bar{X} \sim N(\mu, \sigma^2/n) X ˉ ∼ N ( μ , σ 2 / n )
This was crucial for: Constructing confidence intervals using the standard normal distribution
But if X i X_i X i is NOT normal: Does our entire methodology collapse?
The answer to our concerns comes from one of the most powerful results in all of statistics:
Preview: Central Limit Theorem
The Amazing Result:
Even if the population is NOT normally distributed, the sample mean X ˉ \bar{X} X ˉ becomes approximately normally distributed as the sample size n n n gets large!
For large n : X ˉ ≈ N ( μ , σ 2 n ) regardless of the shape of the population distribution \text{For large } n: \quad \bar{X} \approx N\left(\mu, \frac{\sigma^2}{n}\right) \quad \text{regardless of the shape of the population distribution} For large n : X ˉ ≈ N ( μ , n σ 2 ) regardless of the shape of the population distribution
Why it’s remarkable:
Population could be skewed, discrete, bounded, bimodal - doesn’t matter!
The average tends toward normality through the magic of averaging
This is why the normal distribution is so central to statistics
Exercise 2: WST03/01/May16/7
a. A random sample of 8 apples is taken from an orchard and the weight, in grams, of each apple is measured. The results are given below.
143 131 165 122 137 155 148 151 143 \quad 131 \quad 165 \quad 122 \quad 137 \quad 155 \quad 148 \quad 151 143 131 165 122 137 155 148 151
Calculate unbiased estimates for the mean and the variance of the weights of apples. (4)
b. A population has an unknown mean μ \mu μ and an unknown variance σ 2 \sigma^2 σ 2 . A random sample represented by X 1 , X 2 , X 3 , … , X 8 X_1, X_2, X_3, \ldots, X_8 X 1 , X 2 , X 3 , … , X 8 is taken from this population. Explain why ∑ i = 1 8 ( X i − μ ) 2 \sum_{i=1}^8 (X_i - \mu)^2 ∑ i = 1 8 ( X i − μ ) 2 is not a statistic. (1)
c. Given that E ( S 2 ) = σ 2 E(S^2) = \sigma^2 E ( S 2 ) = σ 2 , where S 2 S^2 S 2 is an unbiased estimator of σ 2 \sigma^2 σ 2 and the statistic
Y = 1 8 ( ∑ i = 1 8 X i 2 − 8 X ‾ 2 ) Y = \frac{1}{8} \left( \sum_{i=1}^8 X_i^2 - 8\overline{X}^2 \right) Y = 8 1 ( ∑ i = 1 8 X i 2 − 8 X 2 )
find E ( Y ) E(Y) E ( Y ) in terms of σ 2 \sigma^2 σ 2 . (2)
d. Hence find the bias, in terms of σ 2 \sigma^2 σ 2 , when Y Y Y is used as an estimator of σ 2 \sigma^2 σ 2 . (2)
Exercise 3: WST03/01/May17/6
a. A company produces a certain type of mug. The masses of these mugs are normally distributed with mean μ \mu μ and standard deviation 1.2 grams. A random sample of 5 mugs is taken and the mass, in grams, of each mug is measured. The results are given below.
229.1 229.6 230.9 231.2 231.7 229.1 \quad 229.6 \quad 230.9 \quad 231.2 \quad 231.7 229.1 229.6 230.9 231.2 231.7
Find a 95% confidence interval for μ \mu μ , giving your limits correct to 1 decimal place. (4)
b. Sonia plans to take 20 random samples, each of 5 mugs. A 95% confidence interval for μ \mu μ is to be determined for each sample. Find the probability that more than 3 of these intervals will not contain μ \mu μ . (3)
Exercise 4: 6691/01/June13/7
a. Lambs are born in a shed on Mill Farm. The birth weights, x x x kg, of a random sample of 8 newborn lambs are given below.
4.12 5.12 4.84 4.65 3.55 3.65 3.96 3.40 4.12 \quad 5.12 \quad 4.84 \quad 4.65 \quad 3.55 \quad 3.65 \quad 3.96 \quad 3.40 4.12 5.12 4.84 4.65 3.55 3.65 3.96 3.40
Calculate unbiased estimates of the mean and variance of the birth weight of lambs born on Mill Farm. (3)
b. A further random sample of 32 lambs is chosen and the unbiased estimates of the mean and variance of the birth weight of lambs from this sample are 4.55 and 0.25 respectively. Treating the combined sample of 40 lambs as a single sample, estimate the standard error of the mean. (7)
c. The owner of Mill Farm researches the breed of lamb and discovers that the population of birth weights is normally distributed with standard deviation 0.67 kg. Calculate a 95% confidence interval for the mean birth weight of this breed of lamb using your combined sample mean. (3)
Exercise 5: 6691/01/May14/6
a. A random sample X 1 , X 2 , … , X n X_1, X_2, \ldots, X_n X 1 , X 2 , … , X n is taken from a population with mean μ \mu μ . Show that
X ‾ = 1 n ( X 1 + X 2 + … + X n ) \overline{X} = \frac{1}{n} (X_1 + X_2 + \ldots + X_n) X = n 1 ( X 1 + X 2 + … + X n )
is an unbiased estimator of the population mean μ \mu μ . (1)
b. A company produces small jars of coffee. Five jars of coffee were taken at random and weighed. The weights, in grams, were as follows:
197 203 205 201 195 197 \quad 203 \quad 205 \quad 201 \quad 195 197 203 205 201 195
Calculate unbiased estimates of the population mean and variance of the weights of the jars produced by the company. (4)
c. It is known from previous results that the weights are normally distributed with standard deviation 4.8 g. The manager is going to take a second random sample. He wishes to ensure that there is at least a 95% probability that the estimate of the population mean is within 1.25 g of its true value. Find the minimum sample size required. (4)
Exercise 6: 6691/01/May16/7
a. A restaurant states that its hamburgers contain 20% fat. Paul claims that the mean fat content of their hamburgers is less than 20%. Paul takes a random sample of 50 hamburgers from the restaurant and finds that they contain a mean fat content of 19.5% with a standard deviation of 1.5%. You may assume that the fat content of hamburgers is normally distributed. Find the 90% confidence interval for the mean fat content of hamburgers from the restaurant. (4)
b. State, with a reason, what action Paul should recommend the restaurant takes over the stated fat content of their hamburgers. (2)
c. The restaurant changes the mean fat content of their hamburgers to μ % \mu\% μ % and adjusts the standard deviation to 2%. Paul takes a sample of size n n n from this new batch of hamburgers. He uses the sample mean X ‾ \overline{X} X as an estimator of μ \mu μ . Find the minimum value of n n n such that
P ( ∣ X ‾ − μ ∣ < 0.5 ) ≥ 0.9. P(|\overline{X} - \mu| < 0.5) \geq 0.9. P ( ∣ X − μ ∣ < 0.5 ) ≥ 0.9. (4)
Exercise 7: WST03/Jan2022/1
a. The weights, x x x kg, of each of 10 watermelons selected at random from Priya’s shop were recorded. The results are summarised as follows:
∑ x = 114.2 ∑ x 2 = 1310.464 \sum x = 114.2 \quad \sum x^2 = 1310.464 ∑ x = 114.2 ∑ x 2 = 1310.464
Calculate unbiased estimates of the mean and the variance of the weights of the watermelons in Priya’s shop. (3)
b. Priya researches the weight of watermelons, for the variety she has in her shop, and discovers that the weights of these watermelons are normally distributed with a standard deviation of 0.8 kg. Calculate a 95% confidence interval for the mean weight of watermelons in Priya’s shop. Give the limits of your confidence interval to 2 decimal places. (4)
c. Priya claims that the confidence interval in part (b) suggests that nearly all of the watermelons in her shop weigh more than 10.5 kg. Use your answer to part (b) to estimate the smallest proportion of watermelons in her shop that weigh less than 10.5 kg. (3)