S3 Chapter 1: Sampling Methods
From Guesswork to Science: How We Study Populations
Section titled “From Guesswork to Science: How We Study Populations”Imagine you’re the manager of a popular bubble tea shop near your school, “HelloTea.” You want to know what students think about your store so you can make it even better.
Starting Simple: Your First Instinct
Section titled “Starting Simple: Your First Instinct”The Most Natural Approach
Section titled “The Most Natural Approach”Let’s begin with the most intuitive idea: what if we could give every student an equal chance of being selected, with no favoritism or bias? This is the foundation of scientific sampling.
Definition: Simple Random Sampling Simple random sampling is a method where every member of the population has an equal and independent chance of being selected. Every possible sample of size has the same probability of being chosen.
The Core Principle: Complete randomness eliminates human bias. You’re not choosing students based on convenience, appearance, friendliness, or any other factor that might skew results.
How to Execute Simple Random Sampling
Section titled “How to Execute Simple Random Sampling”Step 1: Create a Sampling Frame
First, you need a complete list of all students. Let’s say you obtain the school’s student database and assign each student a unique number from 0001 to 3000.
Step 2: Use a Random Number Table
To ensure true randomness, we use a random number table - a pre-generated table of digits arranged with no predictable pattern.
| 86 | 13 | 84 | 10 | 07 | 30 | 39 | 05 | 97 | 96 | 88 | 07 | 37 | 26 | 04 | 89 | 13 | 48 | 19 | 20 |
| 60 | 78 | 48 | 12 | 99 | 47 | 09 | 46 | 91 | 33 | 17 | 21 | 03 | 94 | 79 | 00 | 08 | 50 | 40 | 16 |
| 78 | 48 | 06 | 37 | 82 | 26 | 01 | 06 | 64 | 65 | 94 | 41 | 17 | 26 | 74 | 66 | 61 | 93 | 24 | 97 |
| 80 | 56 | 90 | 79 | 66 | 94 | 18 | 40 | 97 | 79 | 93 | 20 | 41 | 51 | 25 | 04 | 20 | 71 | 76 | 04 |
| 99 | 09 | 39 | 25 | 66 | 31 | 70 | 56 | 30 | 15 | 52 | 17 | 87 | 55 | 31 | 11 | 10 | 68 | 98 | 23 |
| 56 | 32 | 32 | 72 | 91 | 65 | 97 | 36 | 56 | 61 | 12 | 79 | 95 | 17 | 57 | 16 | 53 | 58 | 96 | 36 |
| 66 | 02 | 49 | 93 | 97 | 44 | 99 | 15 | 56 | 86 | 80 | 57 | 11 | 78 | 40 | 23 | 58 | 40 | 86 | 14 |
| 31 | 77 | 53 | 94 | 05 | 93 | 56 | 14 | 71 | 23 | 60 | 46 | 05 | 33 | 23 | 72 | 93 | 10 | 81 | 23 |
| 98 | 79 | 72 | 43 | 14 | 76 | 54 | 77 | 66 | 29 | 84 | 09 | 88 | 56 | 75 | 86 | 41 | 67 | 04 | 42 |
| 50 | 97 | 92 | 15 | 10 | 01 | 57 | 01 | 87 | 33 | 73 | 17 | 70 | 18 | 40 | 21 | 24 | 20 | 66 | 62 |
| 90 | 51 | 94 | 50 | 12 | 48 | 88 | 95 | 09 | 34 | 09 | 30 | 22 | 27 | 25 | 56 | 40 | 76 | 01 | 59 |
| 31 | 99 | 52 | 24 | 13 | 43 | 27 | 88 | 11 | 39 | 41 | 65 | 00 | 84 | 13 | 06 | 31 | 79 | 74 | 97 |
| 22 | 96 | 23 | 34 | 46 | 12 | 67 | 11 | 48 | 06 | 99 | 24 | 14 | 83 | 78 | 37 | 65 | 73 | 39 | 47 |
| 06 | 84 | 55 | 41 | 27 | 06 | 74 | 59 | 14 | 29 | 20 | 14 | 45 | 75 | 31 | 16 | 05 | 41 | 22 | 96 |
| 08 | 64 | 89 | 30 | 25 | 25 | 71 | 35 | 33 | 31 | 04 | 56 | 12 | 67 | 03 | 74 | 07 | 16 | 49 | 32 |
| 86 | 87 | 62 | 43 | 15 | 11 | 76 | 49 | 79 | 13 | 78 | 80 | 93 | 89 | 09 | 57 | 07 | 14 | 40 | 74 |
| 94 | 44 | 97 | 13 | 77 | 04 | 35 | 02 | 12 | 76 | 60 | 91 | 93 | 40 | 81 | 06 | 85 | 85 | 72 | 84 |
| 63 | 25 | 55 | 14 | 66 | 47 | 99 | 90 | 02 | 90 | 83 | 43 | 16 | 01 | 19 | 69 | 11 | 78 | 87 | 16 |
| 11 | 22 | 83 | 98 | 15 | 21 | 18 | 57 | 53 | 42 | 91 | 91 | 26 | 52 | 89 | 13 | 86 | 00 | 47 | 61 |
| 01 | 70 | 10 | 83 | 94 | 71 | 13 | 67 | 11 | 12 | 36 | 54 | 53 | 32 | 90 | 43 | 79 | 01 | 95 | 15 |
Step 3: Select Your Sample
Let’s walk through the process together:
-
Choose a starting point randomly: Close your eyes and point to any position in the table. Let’s say you land on Row 3, Column 2.
-
Decide on a reading pattern: Since our student IDs are 4-digit numbers (0001-3000), we’ll read 4 digits at a time. We’ll read across rows from left to right.
-
Extract numbers systematically:
- Starting at Row 3, Column 2: We see “48”. Continue reading: “48”, “06”, “37”
- Combine into 4-digit groups: 4806, 3782, 2601, 0664, 6594, 4117, 2674…
- Wait! 4806 is larger than 3000 - we skip it!
- Continue: 3782 - skip (too large), 2601 - Select student #2601!
-
Continue until you have 200 unique numbers:
- Skip any number that appears twice or any invalid numbers, such as numbers larger than 3000 or smaller than 0001.
Discovering Problems: When Simple Random Sampling Falls Short
Section titled “Discovering Problems: When Simple Random Sampling Falls Short”After implementing your simple random sampling survey, you encounter some real-world challenges:
The Key Realization: While simple random sampling is unbiased and theoretically perfect, it can be inefficient and might miss important subgroups by chance.
Making It Easier: Systematic Sampling
Section titled “Making It Easier: Systematic Sampling”A More Efficient Approach
Section titled “A More Efficient Approach”Let’s tackle the first problem: the efficiency concern. What if there was a way to maintain randomness while making the process more organized?
The New Plan: Instead of selecting completely random students, you’ll use the student database ordered alphabetically by name. You’ll select every -th student from the list.
Definition: Systematic Sampling Systematic sampling is a method where you select every -th member from an ordered sampling frame, starting from a randomly chosen position between 1 and .
The sampling interval is calculated as:
How to Execute Systematic Sampling
Section titled “How to Execute Systematic Sampling”Step 1: Calculate the Sampling Interval
This means you’ll select every 15th student from your list.
Step 2: Choose a Random Starting Point
Use your random number table to select a number between 1 and 15. Let’s say you get 12.
Step 3: Select Students Systematically
Starting with student #12, select every 15th student:
- Student #12 (starting point), Student #27 ()
- Student #42 (), Student #57 ()
- … continue until you reach student #2997 ()
You’ll automatically get exactly 200 students!
Advantages and Disadvantage of Systematic Sampling
Section titled “Advantages and Disadvantage of Systematic Sampling”However, systematic sampling has a critical weakness: periodicity.
Example: The Apartment Building Survey
Scenario: A researcher wants to survey residents in a 20-floor apartment building. Each floor has 15 apartments numbered 01-15. The building database lists apartments in order: 101, 102, …, 115, 201, 202, …, 215, … 2001, …, 2015.
There are 300 total apartments. The researcher wants a sample of 20 apartments.
(a) Calculate the sampling interval : __________
(b) If you randomly start with student #7, list the first 5 students you would survey:
(c) List the last 2 students in your sample:
(d) What is the potential problem with this sampling method?
Ensuring Representation: Stratified Sampling
Section titled “Ensuring Representation: Stratified Sampling”The Representation Challenge Revisited
Section titled “The Representation Challenge Revisited”Remember our second problem? By chance, simple random sampling might give us too few heavy customers or miss important subgroups entirely. This is especially problematic when:
- Different subgroups might have very different opinions
- Some subgroups are small but critically important
- You want to compare across subgroups
The Key Insight: What if we could guarantee representation from each important subgroup?
Example: HelloTea: Understanding Your Customer Base
Through preliminary research, you discover students fall into four distinct groups based on tea consumption:
| Customer Type | Definition | Count |
|---|---|---|
| Heavy Users | 3 visits/week | 300 students (10%) |
| Regular Users | 1-2 visits/week | 1,200 students (40%) |
| Light Users | 1-3 visits/month | 900 students (30%) |
| Rare/Non-users | 1 visit/month | 600 students (20%) |
| Total | 3,000 students (100%) |
The Business Question: Heavy users contribute the most revenue - their opinions are crucial! How can you ensure they’re adequately represented in your sample of 200?
Definition: Stratified Random Sampling Stratified random sampling divides the population into non-overlapping groups called strata (plural of stratum) based on specific characteristics. Then, simple random sampling is performed independently within each stratum.
How to Execute Stratified Sampling
Section titled “How to Execute Stratified Sampling”Step 1: Identify Your Strata
Choose characteristics that are:
- Relevant to your research question
- Known before sampling (you must be able to classify population members)
- Creating groups that are internally similar but different from each other
For HelloTea: The four user types (Heavy, Regular, Light, Rare)
Step 2: Determine Sample Sizes for Each Stratum
Option A - Proportional Allocation: Match population proportions
| Stratum | Population | Proportion | Sample Size |
|---|---|---|---|
| Heavy Users | 300 | 10% | |
| Regular Users | 1,200 | 40% | |
| Light Users | 900 | 30% | |
| Rare/Non-users | 600 | 20% | |
| Total | 3,000 | 100% | 200 |
Step 3: Randomly Sample Within Each Stratum
When You Don’t Have a List: Quota Sampling
Section titled “When You Don’t Have a List: Quota Sampling”The Practical Constraint
Section titled “The Practical Constraint”Imagine you’re conducting market research for HelloTea, but you encounter a new problem:
The Practical Alternative: Quota Sampling
Section titled “The Practical Alternative: Quota Sampling”Definition: Quota Sampling Quota sampling is a method where you divide the population into groups (like in stratified sampling) and set target numbers (quotas) for each group. However, instead of random selection, you use convenience sampling within each group until quotas are filled.
Key difference from stratified sampling: Selection within groups is non-random.
How to Execute Quota Sampling
Section titled “How to Execute Quota Sampling”Step 1: Define Your Quotas
Based on your knowledge of the customer base, you set the same targets as stratified sampling:
| Customer Type | Quota |
|---|---|
| Heavy Users ( visits/week) | 20 |
| Regular Users (1-2 visits/week) | 80 |
| Light Users (1-3 visits/month) | 60 |
| Rare/Non-users ( 1 visit/month) | 40 |
| Total | 200 |
Step 2: Convenience Sampling Until Quotas Are Met
You station your survey team near HelloTea and the school cafeteria. They approach students and:
- Ask a screening question: “How often do you visit HelloTea?”
- Based on the answer, classify the student into a group
- If that group’s quota isn’t full, conduct the survey
- If the quota is full, politely decline and move to the next student
- Stop when all quotas are filled
Example conversation:
- Surveyor: “Excuse me, how often do you visit HelloTea?”
- Student: “About twice a week.”
- Surveyor checks: Regular Users quota: 65/80 filled
- Surveyor: “Great! Would you mind answering a few questions about your experience?”
Advantages and Disadvantages of Quota Sampling
Section titled “Advantages and Disadvantages of Quota Sampling”Putting It All Together: Choosing the Right Method
Section titled “Putting It All Together: Choosing the Right Method”Now that we’ve explored all four methods through the journey of solving real problems, let’s synthesize what we’ve learned.
| Method | Random Selection? | Main Advantage | Main Disadvantage |
|---|---|---|---|
| Simple | Yes - completely random | Unbiased, theoretically perfect | Can be expensive; might miss subgroups by chance |
| Systematic | Partially - random start, then systematic | Easy to execute; ensures spread | Vulnerable to periodicity; less random than SRS |
| Stratified | Yes - within stratum | Guarantees subgroup representation; more precise | Requires knowing strata beforehand; more complex |
| Quota | No - convenience within quotas | Fast, cheap, no list needed | Non-probability; cannot calculate sampling error |
Exercises
Section titled “Exercises”Past Paper Questions
Section titled “Past Paper Questions”Reflection and Key Takeaways
Section titled “Reflection and Key Takeaways”The Deeper Lesson
Section titled “The Deeper Lesson”Ethical Considerations
Section titled “Ethical Considerations”Sampling isn’t just technical - it’s ethical:
- Representation: Do our sampling methods give everyone a fair voice?
- Bias: Are we systematically excluding certain groups?
- Transparency: Are we honest about our methods and limitations?
- Misuse: Could our results be misinterpreted or misused?
Example: If an election poll only surveys voters in a certain district, they’ll miss all the voters in other districts - their rights will not be properly represented.
Connection to Broader Statistics
Section titled “Connection to Broader Statistics”The sampling methods you’ve learned are foundational to:
- Statistical inference: Drawing conclusions about populations from samples
- Confidence intervals: Quantifying uncertainty in estimates
- Goodness of fit tests: Testing how well a model fits the data
- Rank correlation tests: Testing whether two variables are related in a monotonic way
Looking ahead: In future chapters, you’ll learn how to analyze the data collected through these sampling methods, test hypotheses, and draw rigorous conclusions with quantified uncertainty.