S3 Chapter 7: International Exam Review

How to Use This Review Pack

This handout is a high-yield revision guide for S3 topics: Sampling $\rightarrow$ Combinations of RVs $\rightarrow$ Estimation & CI $\rightarrow$ CLT & Mean Tests $\rightarrow$ Correlation $\rightarrow$ $\chi^2$ Tests.

Running Case: HelloTea

We will keep using HelloTea to connect ideas:

Population: all students (e.g. 3000).
Sample: e.g. $n=200$ students chosen by a sampling method.
Data types: ratings (1—5), drink choice (tea/coffee/hot chocolate), screen-time, etc.

Chapter 1 Review: Sampling Methods (Getting Good Data)

Core Definitions

Definition: Population, Sample, Sampling Frame

Population: the full group of interest.
Sample: the selected observations from the population.
Sampling frame: the actual list you can sample from.

Four Methods You Must Know

Method	Random?	How to do it	Main risk / limitation
Simple Random (SRS)	Yes	choose $n$ IDs using RNG / random number table	can be time-consuming; may miss small subgroups by chance
Systematic	Partly	choose random start, then every $k$ th	periodicity (hidden patterns in the list)
Stratified	Yes (within strata)	split into strata, SRS inside each	need strata info beforehand; more steps
Quota	No	set quotas, then convenience within each	selection bias; no valid sampling error / inference guarantee

Common Exam Pitfalls (From Examiner Reports)

Missing the numbering step: Before using random numbers, you MUST explicitly state that you will “number/label the sampling frame (e.g., from 1 to $N$ ).”
Systematic sampling errors: If the period is $k$ , students often forget that you cannot select two adjacent items.
Vague language: Stating a method is “more accurate” or “more representative” usually scores zero. Use precise terms like “reflects the population structure” (for stratified) or “gives every item an equal chance of selection” (for simple random).
Quota vs. Stratified: Quota sampling suffers from interviewer bias (the person choosing who to survey), which means no valid sampling error can be calculated.

Chapter 2 Review: Combinations of Random Variables

Core Rules for Expectation and Variance

The Biggest Exam Trap: $3X$ vs $X_1+X_2+X_3$

Scenario	Notation	Variance
”3 times the weight of a randomly chosen bag"	$3X$	$\mathrm{Var}(3X) = 3^2 \mathrm{Var}(X) = \mathbf{9\mathrm{Var}(X)}$
"The total weight of 3 randomly chosen bags”	$X_1 + X_2 + X_3$	$\mathrm{Var}(X_1+X_2+X_3) = \mathbf{3\mathrm{Var}(X)}$

Common Exam Pitfalls (From Examiner Reports)

Subtracting variances: Students frequently write $\mathrm{Var}(X-Y) = \mathrm{Var}(X) - \mathrm{Var}(Y)$ . This is WRONG! Variances always add for independent variables: $\mathrm{Var}(X-Y) = \mathrm{Var}(X) + \mathrm{Var}(Y)$ .
Averaging variables: To find the variance of the sample mean of 5 observations $A = \frac{X_1+...+X_5}{5}$ , you must square the denominator: $\mathrm{Var}(A) = \frac{5\mathrm{Var}(X)}{25} = \frac{\mathrm{Var}(X)}{5}$ . Many incorrectly divide by 5 instead of 25.
Difference without direction: If the question asks for the probability of a “difference” in weight being greater than 5g, you must calculate $P(|X-Y| > 5) = P(X-Y > 5) + P(X-Y < -5)$ (two tails).
Standardisation sign errors: When equating your standardisation $Z = \frac{x-\mu}{\sigma}$ to a critical value (e.g. 1.2816), make sure the signs match. If the probability area implies $x$ is below the mean, $Z$ must be negative!

Chapter 3 Review: Estimation, Bias, Standard Error, Confidence Interval

The Three Layers: Parameter, Statistic, Value

Bias

Definition: Bias

\mathrm{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta.

Standard Error (SE)

Definition: Standard Error

\mathrm{SE}(\hat{\theta}) = \sqrt{\mathrm{Var}(\hat{\theta})}.

Confidence Intervals (CI)

Definition: Generic Form

\text{Estimate} \pm (\text{critical value}) \times (\text{standard error}).

Common Exam Pitfalls (From Examiner Reports)

Wrong interpretation: “95% probability that $\mu$ is in this interval” (not correct phrasing). Instead say: “We are 95% confident that the true population mean lies within this interval.”
Mixing SD and SE: SD describes individual items; SE describes estimator variability. Don’t forget to divide by $\sqrt{n}$ when calculating the standard error!
Hypotheses notation: Always use population parameters (e.g. $\mu$ ) in hypotheses, never sample statistics ( $\bar{x}$ ). Also, define your subscripts clearly (e.g. $\mu_A$ vs $\mu_B$ ).
CIs as a Binomial process: If asked for the probability that $Y$ out of $n$ calculated confidence intervals contain $\mu$ , you must use the Binomial distribution $Y \sim B(n, \text{confidence level})$ .

Chapter 4 Review: CLT & Inference for Means

What CLT Actually Says

Theorem: Central Limit Theorem (usable form) If $X_1,\ldots,X_n$ are i.i.d. with mean $\mu$ and variance $\sigma^2<\infty$ , then for large $n$ ,

\bar{X} \approx N\!\left(\mu,\frac{\sigma^2}{n}\right).

One-Sample Mean Test (Large-Sample $z$ -test idea)

To test $H_0:\mu=\mu_0$ ,

Z=\frac{\bar{x}-\mu_0}{S/\sqrt{n}} \approx N(0,1)\ \text{under }H_0\quad(\text{large }n).

Decision by critical value or $p$ -value.

Difference of Two Means (Independent Samples)

If two independent large samples:

\bar{X}-\bar{Y} \approx N\!\left(\mu_X-\mu_Y,\ \frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}\right).

Use estimated SE with sample SDs.

Common Exam Pitfalls (From Examiner Reports)

Explaining the CLT: Many students lose marks by saying “the sample is normally distributed.” You must say “the sample mean is approximately normally distributed.”
Combining samples: When asked to treat two samples as a single combined sample, do not calculate a weighted standard deviation. Find the new overall mean and calculate the standard error for the new total size $n_1+n_2$ .
Using CLT with small $n$ when the population is clearly skewed/heavy-tailed.
Forgetting “independent samples” for the two-sample formula.
Mixing up one-tailed vs two-tailed critical regions.

Chapter 5 Review: Correlation & Rank Correlation

PMCC (Pearson) Recap

Given paired data $(x_i,y_i)$ for $i=1,\ldots,n$ ,

r=\frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}},\quad S_{xy}=\sum xy-\frac{(\sum x)(\sum y)}{n}.

Testing for PMCC (Table Method)

$H_0:\rho=0$
Compare $|r|$ with the critical value for $(n,\alpha)$ .

Spearman’s Rank Correlation

Use ranks when:

relationship is monotonic but not linear, or data is ordinal, or outliers break Pearson.

If no ties, shortcut:

r_s = 1-\frac{6\sum d^2}{n(n^2-1)}.

Test $r_s$ using Spearman critical value tables.

Common Exam Pitfalls (From Examiner Reports)

Tied Ranks: If there are tied ranks, you MUST use the full PMCC formula on the ranks. The $1-\frac{6\sum d^2}{n(n^2-1)}$ shortcut is only valid when there are no ties!
Alphabetical coding: When given letters (e.g., Grades A, B, C), students sometimes rank them alphabetically instead of by their actual value/order.
Hypotheses: Always state hypotheses in terms of $\rho$ or $\rho_s$ . Never use $r$ or state them just in words.
Contextual conclusion: Simply stating “there is correlation” is insufficient. You must include the direction and context (e.g., “there is evidence of positive correlation between age and price”).
Non-linear relationships: If a PMCC test shows no significant correlation, but a Spearman’s test shows significant correlation, it strongly suggests a non-linear relationship exists.

Chapter 6 Review: $\chi^2$ Tests (Goodness of Fit & Independence)

$\chi^2$ Statistic (same structure for both tests)

\chi^2=\sum \frac{(O-E)^2}{E}.

Goodness of Fit (GOF)

Use when: one categorical variable, testing a specified distribution/model.
Hypotheses: $H_0$ : model fits; $H_1$ : model does not fit.
df: $df=k-1-m$ where $m=$ parameters estimated from data.

Independence in a Contingency Table

Use when: two categorical variables; test for association.
Expected: $E_{ij}=\dfrac{(\text{row total})(\text{col total})}{\text{grand total}}$ .
df: $(r-1)(c-1)$ .

Conclusion Sentence Template

Common Exam Pitfalls (From Examiner Reports)

Frequencies, not percentages: A Chi-squared test MUST use raw frequencies (counts). If given percentages, convert them back to frequencies first.
Hypotheses for estimated parameters: If you estimate a parameter (e.g. $\lambda=3.5$ ), do NOT include the 3.5 in your hypotheses. Write ” $H_0$ : A Poisson distribution is a suitable model” (not “Po(3.5)”).
Degrees of Freedom ( $m$ ): Students often forget to subtract $m$ (the number of estimated parameters) when calculating $df = k - 1 - m$ for Goodness of Fit tests.
Pooling correctly: You pool cells to ensure Expected frequencies are $\ge 5$ . Do not pool based solely on Observed frequencies!

One-Page Formula Sheet (Students Should Memorise)

Expectation: $E(aX \pm bY)=aE(X) \pm bE(Y)$
Variance (Indep): $\mathrm{Var}(aX \pm bY)=a^2\mathrm{Var}(X) + b^2\mathrm{Var}(Y)$
Multiple items: $\mathrm{Var}(X_1+..+X_n)=n\mathrm{Var}(X)$
Sample mean: $\bar{X}=\dfrac{1}{n}\sum X_i$ , $\mathrm{SE}(\bar{X})=\dfrac{\sigma}{\sqrt{n}}$
Sample variance: $S^2=\dfrac{1}{n-1}\sum (X_i-\bar{X})^2$
CI for mean (large $n$ ): $\bar{x}\pm z^*\dfrac{S}{\sqrt{n}}$
CLT: $\bar{X}\approx N\!\left(\mu,\dfrac{\sigma^2}{n}\right)$
PMCC: $r=\dfrac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}$
Spearman (no ties): $r_s=1-\dfrac{6\sum d^2}{n(n^2-1)}$
$\chi^2$ : $\chi^2=\sum\dfrac{(O-E)^2}{E}$
df GOF: $k-1-m$ , df independence: $(r-1)(c-1)$

S3 Chapter 7: International Exam Review

How to Use This Review Pack

Running Case: HelloTea

Chapter 1 Review: Sampling Methods (Getting Good Data)

Core Definitions

Four Methods You Must Know

Common Exam Pitfalls (From Examiner Reports)

Chapter 2 Review: Combinations of Random Variables

Core Rules for Expectation and Variance

The Biggest Exam Trap: 3X3X3X vs X1+X2+X3X_1+X_2+X_3X1​+X2​+X3​

Common Exam Pitfalls (From Examiner Reports)

Chapter 3 Review: Estimation, Bias, Standard Error, Confidence Interval

The Three Layers: Parameter, Statistic, Value

Bias

Standard Error (SE)

Confidence Intervals (CI)

Common Exam Pitfalls (From Examiner Reports)

Chapter 4 Review: CLT & Inference for Means

What CLT Actually Says

One-Sample Mean Test (Large-Sample zzz-test idea)

Difference of Two Means (Independent Samples)

Common Exam Pitfalls (From Examiner Reports)

Chapter 5 Review: Correlation & Rank Correlation

PMCC (Pearson) Recap

Testing for PMCC (Table Method)

Spearman’s Rank Correlation

Common Exam Pitfalls (From Examiner Reports)

Chapter 6 Review: χ2\chi^2χ2 Tests (Goodness of Fit & Independence)

χ2\chi^2χ2 Statistic (same structure for both tests)

Goodness of Fit (GOF)

Independence in a Contingency Table

Conclusion Sentence Template

Common Exam Pitfalls (From Examiner Reports)

One-Page Formula Sheet (Students Should Memorise)

Mixed Practice (No Solutions Provided)

The Biggest Exam Trap: $3X$ vs $X_1+X_2+X_3$

One-Sample Mean Test (Large-Sample $z$ -test idea)

Chapter 6 Review: $\chi^2$ Tests (Goodness of Fit & Independence)

$\chi^2$ Statistic (same structure for both tests)