S3 Chapter 7: International Exam Review
How to Use This Review Pack
Section titled “How to Use This Review Pack”This handout is a high-yield revision guide for S3 topics: Sampling Combinations of RVs Estimation & CI CLT & Mean Tests Correlation Tests.
Running Case: HelloTea
Section titled “Running Case: HelloTea”We will keep using HelloTea to connect ideas:
- Population: all students (e.g. 3000).
- Sample: e.g. students chosen by a sampling method.
- Data types: ratings (1—5), drink choice (tea/coffee/hot chocolate), screen-time, etc.
Chapter 1 Review: Sampling Methods (Getting Good Data)
Section titled “Chapter 1 Review: Sampling Methods (Getting Good Data)”Core Definitions
Section titled “Core Definitions”Definition: Population, Sample, Sampling Frame
- Population: the full group of interest.
- Sample: the selected observations from the population.
- Sampling frame: the actual list you can sample from.
Four Methods You Must Know
Section titled “Four Methods You Must Know”| Method | Random? | How to do it | Main risk / limitation |
|---|---|---|---|
| Simple Random (SRS) | Yes | choose IDs using RNG / random number table | can be time-consuming; may miss small subgroups by chance |
| Systematic | Partly | choose random start, then every th | periodicity (hidden patterns in the list) |
| Stratified | Yes (within strata) | split into strata, SRS inside each | need strata info beforehand; more steps |
| Quota | No | set quotas, then convenience within each | selection bias; no valid sampling error / inference guarantee |
Common Exam Pitfalls (From Examiner Reports)
Section titled “Common Exam Pitfalls (From Examiner Reports)”- Missing the numbering step: Before using random numbers, you MUST explicitly state that you will “number/label the sampling frame (e.g., from 1 to ).”
- Systematic sampling errors: If the period is , students often forget that you cannot select two adjacent items.
- Vague language: Stating a method is “more accurate” or “more representative” usually scores zero. Use precise terms like “reflects the population structure” (for stratified) or “gives every item an equal chance of selection” (for simple random).
- Quota vs. Stratified: Quota sampling suffers from interviewer bias (the person choosing who to survey), which means no valid sampling error can be calculated.
Chapter 2 Review: Combinations of Random Variables
Section titled “Chapter 2 Review: Combinations of Random Variables”Core Rules for Expectation and Variance
Section titled “Core Rules for Expectation and Variance”The Biggest Exam Trap: vs
Section titled “The Biggest Exam Trap: 3X3X3X vs X1+X2+X3X_1+X_2+X_3X1+X2+X3”| Scenario | Notation | Variance |
|---|---|---|
| ”3 times the weight of a randomly chosen bag" | ||
| "The total weight of 3 randomly chosen bags” |
Common Exam Pitfalls (From Examiner Reports)
Section titled “Common Exam Pitfalls (From Examiner Reports)”- Subtracting variances: Students frequently write . This is WRONG! Variances always add for independent variables: .
- Averaging variables: To find the variance of the sample mean of 5 observations , you must square the denominator: . Many incorrectly divide by 5 instead of 25.
- Difference without direction: If the question asks for the probability of a “difference” in weight being greater than 5g, you must calculate (two tails).
- Standardisation sign errors: When equating your standardisation to a critical value (e.g. 1.2816), make sure the signs match. If the probability area implies is below the mean, must be negative!
Chapter 3 Review: Estimation, Bias, Standard Error, Confidence Interval
Section titled “Chapter 3 Review: Estimation, Bias, Standard Error, Confidence Interval”The Three Layers: Parameter, Statistic, Value
Section titled “The Three Layers: Parameter, Statistic, Value”Definition: Bias
Standard Error (SE)
Section titled “Standard Error (SE)”Definition: Standard Error
Confidence Intervals (CI)
Section titled “Confidence Intervals (CI)”Definition: Generic Form
Common Exam Pitfalls (From Examiner Reports)
Section titled “Common Exam Pitfalls (From Examiner Reports)”- Wrong interpretation: “95% probability that is in this interval” (not correct phrasing). Instead say: “We are 95% confident that the true population mean lies within this interval.”
- Mixing SD and SE: SD describes individual items; SE describes estimator variability. Don’t forget to divide by when calculating the standard error!
- Hypotheses notation: Always use population parameters (e.g. ) in hypotheses, never sample statistics (). Also, define your subscripts clearly (e.g. vs ).
- CIs as a Binomial process: If asked for the probability that out of calculated confidence intervals contain , you must use the Binomial distribution .
Chapter 4 Review: CLT & Inference for Means
Section titled “Chapter 4 Review: CLT & Inference for Means”What CLT Actually Says
Section titled “What CLT Actually Says”Theorem: Central Limit Theorem (usable form) If are i.i.d. with mean and variance , then for large ,
One-Sample Mean Test (Large-Sample -test idea)
Section titled “One-Sample Mean Test (Large-Sample zzz-test idea)”To test ,
Decision by critical value or -value.
Difference of Two Means (Independent Samples)
Section titled “Difference of Two Means (Independent Samples)”If two independent large samples:
Use estimated SE with sample SDs.
Common Exam Pitfalls (From Examiner Reports)
Section titled “Common Exam Pitfalls (From Examiner Reports)”- Explaining the CLT: Many students lose marks by saying “the sample is normally distributed.” You must say “the sample mean is approximately normally distributed.”
- Combining samples: When asked to treat two samples as a single combined sample, do not calculate a weighted standard deviation. Find the new overall mean and calculate the standard error for the new total size .
- Using CLT with small when the population is clearly skewed/heavy-tailed.
- Forgetting “independent samples” for the two-sample formula.
- Mixing up one-tailed vs two-tailed critical regions.
Chapter 5 Review: Correlation & Rank Correlation
Section titled “Chapter 5 Review: Correlation & Rank Correlation”PMCC (Pearson) Recap
Section titled “PMCC (Pearson) Recap”Given paired data for ,
Testing for PMCC (Table Method)
Section titled “Testing for PMCC (Table Method)”- Compare with the critical value for .
Spearman’s Rank Correlation
Section titled “Spearman’s Rank Correlation”Use ranks when:
- relationship is monotonic but not linear, or data is ordinal, or outliers break Pearson.
If no ties, shortcut:
Test using Spearman critical value tables.
Common Exam Pitfalls (From Examiner Reports)
Section titled “Common Exam Pitfalls (From Examiner Reports)”- Tied Ranks: If there are tied ranks, you MUST use the full PMCC formula on the ranks. The shortcut is only valid when there are no ties!
- Alphabetical coding: When given letters (e.g., Grades A, B, C), students sometimes rank them alphabetically instead of by their actual value/order.
- Hypotheses: Always state hypotheses in terms of or . Never use or state them just in words.
- Contextual conclusion: Simply stating “there is correlation” is insufficient. You must include the direction and context (e.g., “there is evidence of positive correlation between age and price”).
- Non-linear relationships: If a PMCC test shows no significant correlation, but a Spearman’s test shows significant correlation, it strongly suggests a non-linear relationship exists.
Chapter 6 Review: Tests (Goodness of Fit & Independence)
Section titled “Chapter 6 Review: χ2\chi^2χ2 Tests (Goodness of Fit & Independence)”Statistic (same structure for both tests)
Section titled “χ2\chi^2χ2 Statistic (same structure for both tests)”Goodness of Fit (GOF)
Section titled “Goodness of Fit (GOF)”- Use when: one categorical variable, testing a specified distribution/model.
- Hypotheses: : model fits; : model does not fit.
- df: where parameters estimated from data.
Independence in a Contingency Table
Section titled “Independence in a Contingency Table”- Use when: two categorical variables; test for association.
- Expected: .
- df: .
Conclusion Sentence Template
Section titled “Conclusion Sentence Template”Common Exam Pitfalls (From Examiner Reports)
Section titled “Common Exam Pitfalls (From Examiner Reports)”- Frequencies, not percentages: A Chi-squared test MUST use raw frequencies (counts). If given percentages, convert them back to frequencies first.
- Hypotheses for estimated parameters: If you estimate a parameter (e.g. ), do NOT include the 3.5 in your hypotheses. Write ”: A Poisson distribution is a suitable model” (not “Po(3.5)”).
- Degrees of Freedom (): Students often forget to subtract (the number of estimated parameters) when calculating for Goodness of Fit tests.
- Pooling correctly: You pool cells to ensure Expected frequencies are . Do not pool based solely on Observed frequencies!
One-Page Formula Sheet (Students Should Memorise)
Section titled “One-Page Formula Sheet (Students Should Memorise)”- Expectation:
- Variance (Indep):
- Multiple items:
- Sample mean: ,
- Sample variance:
- CI for mean (large ):
- CLT:
- PMCC:
- Spearman (no ties):
- :
- df GOF: , df independence: