S3 Chapter 1: Sampling Methods

From Guesswork to Science: How We Study Populations

Imagine you’re the manager of a popular bubble tea shop near your school, “HelloTea.” You want to know what students think about your store so you can make it even better.

Starting Simple: Your First Instinct

The Most Natural Approach

Let’s begin with the most intuitive idea: what if we could give every student an equal chance of being selected, with no favoritism or bias? This is the foundation of scientific sampling.

Definition: Simple Random Sampling Simple random sampling is a method where every member of the population has an equal and independent chance of being selected. Every possible sample of size $n$ has the same probability of being chosen.

The Core Principle: Complete randomness eliminates human bias. You’re not choosing students based on convenience, appearance, friendliness, or any other factor that might skew results.

How to Execute Simple Random Sampling

Step 1: Create a Sampling Frame

First, you need a complete list of all students. Let’s say you obtain the school’s student database and assign each student a unique number from 0001 to 3000.

Step 2: Use a Random Number Table

To ensure true randomness, we use a random number table - a pre-generated table of digits arranged with no predictable pattern.

86	13	84	10	07	30	39	05	97	96	88	07	37	26	04	89	13	48	19	20
60	78	48	12	99	47	09	46	91	33	17	21	03	94	79	00	08	50	40	16
78	48	06	37	82	26	01	06	64	65	94	41	17	26	74	66	61	93	24	97
80	56	90	79	66	94	18	40	97	79	93	20	41	51	25	04	20	71	76	04
99	09	39	25	66	31	70	56	30	15	52	17	87	55	31	11	10	68	98	23
56	32	32	72	91	65	97	36	56	61	12	79	95	17	57	16	53	58	96	36
66	02	49	93	97	44	99	15	56	86	80	57	11	78	40	23	58	40	86	14
31	77	53	94	05	93	56	14	71	23	60	46	05	33	23	72	93	10	81	23
98	79	72	43	14	76	54	77	66	29	84	09	88	56	75	86	41	67	04	42
50	97	92	15	10	01	57	01	87	33	73	17	70	18	40	21	24	20	66	62
90	51	94	50	12	48	88	95	09	34	09	30	22	27	25	56	40	76	01	59
31	99	52	24	13	43	27	88	11	39	41	65	00	84	13	06	31	79	74	97
22	96	23	34	46	12	67	11	48	06	99	24	14	83	78	37	65	73	39	47
06	84	55	41	27	06	74	59	14	29	20	14	45	75	31	16	05	41	22	96
08	64	89	30	25	25	71	35	33	31	04	56	12	67	03	74	07	16	49	32
86	87	62	43	15	11	76	49	79	13	78	80	93	89	09	57	07	14	40	74
94	44	97	13	77	04	35	02	12	76	60	91	93	40	81	06	85	85	72	84
63	25	55	14	66	47	99	90	02	90	83	43	16	01	19	69	11	78	87	16
11	22	83	98	15	21	18	57	53	42	91	91	26	52	89	13	86	00	47	61
01	70	10	83	94	71	13	67	11	12	36	54	53	32	90	43	79	01	95	15

Step 3: Select Your Sample

Let’s walk through the process together:

Choose a starting point randomly: Close your eyes and point to any position in the table. Let’s say you land on Row 3, Column 2.
Decide on a reading pattern: Since our student IDs are 4-digit numbers (0001-3000), we’ll read 4 digits at a time. We’ll read across rows from left to right.
Extract numbers systematically:
- Starting at Row 3, Column 2: We see “48”. Continue reading: “48”, “06”, “37”
- Combine into 4-digit groups: 4806, 3782, 2601, 0664, 6594, 4117, 2674…
- Wait! 4806 is larger than 3000 - we skip it!
- Continue: 3782 - skip (too large), 2601 - Select student #2601!
Continue until you have 200 unique numbers:
- Skip any number that appears twice or any invalid numbers, such as numbers larger than 3000 or smaller than 0001.

Discovering Problems: When Simple Random Sampling Falls Short

After implementing your simple random sampling survey, you encounter some real-world challenges:

The Key Realization: While simple random sampling is unbiased and theoretically perfect, it can be inefficient and might miss important subgroups by chance.

Making It Easier: Systematic Sampling

A More Efficient Approach

Let’s tackle the first problem: the efficiency concern. What if there was a way to maintain randomness while making the process more organized?

The New Plan: Instead of selecting completely random students, you’ll use the student database ordered alphabetically by name. You’ll select every $k$ -th student from the list.

Definition: Systematic Sampling Systematic sampling is a method where you select every $k$ -th member from an ordered sampling frame, starting from a randomly chosen position between 1 and $k$ .

The sampling interval $k$ is calculated as: $k = \frac{\text{Population size}}{\text{Sample size}}$

How to Execute Systematic Sampling

Step 1: Calculate the Sampling Interval

$k = \frac{3000 \text{ students}}{200 \text{ sample size}} = 15$

This means you’ll select every 15th student from your list.

Step 2: Choose a Random Starting Point

Use your random number table to select a number between 1 and 15. Let’s say you get 12.

Step 3: Select Students Systematically

Starting with student #12, select every 15th student:

Student #12 (starting point), Student #27 ( $12 + 15$ )
Student #42 ( $27 + 15$ ), Student #57 ( $42 + 15$ )
… continue until you reach student #2997 ( $12 + 199 \times 15$ )

You’ll automatically get exactly 200 students!

Advantages and Disadvantage of Systematic Sampling

However, systematic sampling has a critical weakness: periodicity.

Example: The Apartment Building Survey

Scenario: A researcher wants to survey residents in a 20-floor apartment building. Each floor has 15 apartments numbered 01-15. The building database lists apartments in order: 101, 102, …, 115, 201, 202, …, 215, … 2001, …, 2015.

There are 300 total apartments. The researcher wants a sample of 20 apartments.

(a) Calculate the sampling interval $k$ : __________

(b) If you randomly start with student #7, list the first 5 students you would survey:

(d) What is the potential problem with this sampling method?

Ensuring Representation: Stratified Sampling

The Representation Challenge Revisited

Remember our second problem? By chance, simple random sampling might give us too few heavy customers or miss important subgroups entirely. This is especially problematic when:

Different subgroups might have very different opinions
Some subgroups are small but critically important
You want to compare across subgroups

The Key Insight: What if we could guarantee representation from each important subgroup?

Example: HelloTea: Understanding Your Customer Base

Through preliminary research, you discover students fall into four distinct groups based on tea consumption:

Customer Type	Definition	Count
Heavy Users	$\geq$ 3 visits/week	300 students (10%)
Regular Users	1-2 visits/week	1,200 students (40%)
Light Users	1-3 visits/month	900 students (30%)
Rare/Non-users	$<$ 1 visit/month	600 students (20%)
Total		3,000 students (100%)

The Business Question: Heavy users contribute the most revenue - their opinions are crucial! How can you ensure they’re adequately represented in your sample of 200?

Definition: Stratified Random Sampling Stratified random sampling divides the population into non-overlapping groups called strata (plural of stratum) based on specific characteristics. Then, simple random sampling is performed independently within each stratum.

How to Execute Stratified Sampling

Step 1: Identify Your Strata

Choose characteristics that are:

Relevant to your research question
Known before sampling (you must be able to classify population members)
Creating groups that are internally similar but different from each other

For HelloTea: The four user types (Heavy, Regular, Light, Rare)

Step 2: Determine Sample Sizes for Each Stratum

Option A - Proportional Allocation: Match population proportions

Stratum	Population	Proportion	Sample Size
Heavy Users	300	10%	$200 \times 0.10 = 20$
Regular Users	1,200	40%	$200 \times 0.40 = 80$
Light Users	900	30%	$200 \times 0.30 = 60$
Rare/Non-users	600	20%	$200 \times 0.20 = 40$
Total	3,000	100%	200

Step 3: Randomly Sample Within Each Stratum

When You Don’t Have a List: Quota Sampling

The Practical Constraint

Imagine you’re conducting market research for HelloTea, but you encounter a new problem:

The Practical Alternative: Quota Sampling

Definition: Quota Sampling Quota sampling is a method where you divide the population into groups (like in stratified sampling) and set target numbers (quotas) for each group. However, instead of random selection, you use convenience sampling within each group until quotas are filled.

Key difference from stratified sampling: Selection within groups is non-random.

How to Execute Quota Sampling

Step 1: Define Your Quotas

Based on your knowledge of the customer base, you set the same targets as stratified sampling:

Customer Type	Quota
Heavy Users ( $\geq 3$ visits/week)	20
Regular Users (1-2 visits/week)	80
Light Users (1-3 visits/month)	60
Rare/Non-users ( $<$ 1 visit/month)	40
Total	200

Step 2: Convenience Sampling Until Quotas Are Met

You station your survey team near HelloTea and the school cafeteria. They approach students and:

Ask a screening question: “How often do you visit HelloTea?”
Based on the answer, classify the student into a group
If that group’s quota isn’t full, conduct the survey
If the quota is full, politely decline and move to the next student
Stop when all quotas are filled

Example conversation:

Surveyor: “Excuse me, how often do you visit HelloTea?”
Student: “About twice a week.”
Surveyor checks: Regular Users quota: 65/80 filled $\checkmark$
Surveyor: “Great! Would you mind answering a few questions about your experience?”

Advantages and Disadvantages of Quota Sampling

Putting It All Together: Choosing the Right Method

Now that we’ve explored all four methods through the journey of solving real problems, let’s synthesize what we’ve learned.

Method	Random Selection?	Main Advantage	Main Disadvantage
Simple	Yes - completely random	Unbiased, theoretically perfect	Can be expensive; might miss subgroups by chance
Systematic	Partially - random start, then systematic	Easy to execute; ensures spread	Vulnerable to periodicity; less random than SRS
Stratified	Yes - within stratum	Guarantees subgroup representation; more precise	Requires knowing strata beforehand; more complex
Quota	No - convenience within quotas	Fast, cheap, no list needed	Non-probability; cannot calculate sampling error

Exercises

Past Paper Questions

A college manager wants to survey students’ opinions of enrichment activities. She decides to survey the students on the courses summarised in the table below.

Course	Number of students enrolled
Leisure and Sport	420
Information Technology	337
Health and Social Care	200
Media Studies	43

Each student takes only one course.

The manager has access to the college’s information system that holds full details of each of the enrolled students including name, address, telephone number and their course of study. She wants to compare the opinions of students on each course and has a generous budget to pay for the cost of the survey.

(a) Give one advantage and one disadvantage of carrying out this survey using (i) quota sampling, (ii) stratified sampling. (2)

The manager decides to take a stratified sample of 100 students.

(b) Calculate the number of students to be sampled from each course. (3)

A company wants to survey its employees’ attitudes to work. The company’s workforce is located at three offices. The number of employees at each location is summarised in the table below.

Office location	Number of employees
Bristol	856
Dudley	429
Glasgow	1215

Each employee is located at only one office.

A personnel assistant plans to survey the first 50 employees who arrive for work at the Bristol office on a Monday morning.

(a) Give two reasons why this survey is likely to lead to a biased response. (2)

A personnel manager has access to the company’s information system that holds details of each employee including their place of work.

The manager decides to take a stratified sample of 150 employees.

(b) Describe how to choose employees for this stratified sample. (3)

Reflection and Key Takeaways

The Deeper Lesson

Ethical Considerations

Sampling isn’t just technical - it’s ethical:

Representation: Do our sampling methods give everyone a fair voice?
Bias: Are we systematically excluding certain groups?
Transparency: Are we honest about our methods and limitations?
Misuse: Could our results be misinterpreted or misused?

Example: If an election poll only surveys voters in a certain district, they’ll miss all the voters in other districts - their rights will not be properly represented.

Connection to Broader Statistics

The sampling methods you’ve learned are foundational to:

Statistical inference: Drawing conclusions about populations from samples
Confidence intervals: Quantifying uncertainty in estimates
Goodness of fit tests: Testing how well a model fits the data
Rank correlation tests: Testing whether two variables are related in a monotonic way

Looking ahead: In future chapters, you’ll learn how to analyze the data collected through these sampling methods, test hypotheses, and draw rigorous conclusions with quantified uncertainty.

86	13	84	10	07	30	39	05	97	96	88	07	37	26	04	89	13	48	19	20
60	78	48	12	99	47	09	46	91	33	17	21	03	94	79	00	08	50	40	16
78	48	06	37	82	26	01	06	64	65	94	41	17	26	74	66	61	93	24	97
80	56	90	79	66	94	18	40	97	79	93	20	41	51	25	04	20	71	76	04
99	09	39	25	66	31	70	56	30	15	52	17	87	55	31	11	10	68	98	23
56	32	32	72	91	65	97	36	56	61	12	79	95	17	57	16	53	58	96	36
66	02	49	93	97	44	99	15	56	86	80	57	11	78	40	23	58	40	86	14
31	77	53	94	05	93	56	14	71	23	60	46	05	33	23	72	93	10	81	23
98	79	72	43	14	76	54	77	66	29	84	09	88	56	75	86	41	67	04	42
50	97	92	15	10	01	57	01	87	33	73	17	70	18	40	21	24	20	66	62
90	51	94	50	12	48	88	95	09	34	09	30	22	27	25	56	40	76	01	59
31	99	52	24	13	43	27	88	11	39	41	65	00	84	13	06	31	79	74	97
22	96	23	34	46	12	67	11	48	06	99	24	14	83	78	37	65	73	39	47
06	84	55	41	27	06	74	59	14	29	20	14	45	75	31	16	05	41	22	96
08	64	89	30	25	25	71	35	33	31	04	56	12	67	03	74	07	16	49	32
86	87	62	43	15	11	76	49	79	13	78	80	93	89	09	57	07	14	40	74
94	44	97	13	77	04	35	02	12	76	60	91	93	40	81	06	85	85	72	84
63	25	55	14	66	47	99	90	02	90	83	43	16	01	19	69	11	78	87	16
11	22	83	98	15	21	18	57	53	42	91	91	26	52	89	13	86	00	47	61
01	70	10	83	94	71	13	67	11	12	36	54	53	32	90	43	79	01	95	15

86	13	84	10	07	30	39	05	97	96	88	07	37	26	04	89	13	48	19	20
60	78	48	12	99	47	09	46	91	33	17	21	03	94	79	00	08	50	40	16
78	48	06	37	82	26	01	06	64	65	94	41	17	26	74	66	61	93	24	97
80	56	90	79	66	94	18	40	97	79	93	20	41	51	25	04	20	71	76	04
99	09	39	25	66	31	70	56	30	15	52	17	87	55	31	11	10	68	98	23
56	32	32	72	91	65	97	36	56	61	12	79	95	17	57	16	53	58	96	36
66	02	49	93	97	44	99	15	56	86	80	57	11	78	40	23	58	40	86	14
31	77	53	94	05	93	56	14	71	23	60	46	05	33	23	72	93	10	81	23
98	79	72	43	14	76	54	77	66	29	84	09	88	56	75	86	41	67	04	42
50	97	92	15	10	01	57	01	87	33	73	17	70	18	40	21	24	20	66	62
90	51	94	50	12	48	88	95	09	34	09	30	22	27	25	56	40	76	01	59
31	99	52	24	13	43	27	88	11	39	41	65	00	84	13	06	31	79	74	97
22	96	23	34	46	12	67	11	48	06	99	24	14	83	78	37	65	73	39	47
06	84	55	41	27	06	74	59	14	29	20	14	45	75	31	16	05	41	22	96
08	64	89	30	25	25	71	35	33	31	04	56	12	67	03	74	07	16	49	32
86	87	62	43	15	11	76	49	79	13	78	80	93	89	09	57	07	14	40	74
94	44	97	13	77	04	35	02	12	76	60	91	93	40	81	06	85	85	72	84
63	25	55	14	66	47	99	90	02	90	83	43	16	01	19	69	11	78	87	16
11	22	83	98	15	21	18	57	53	42	91	91	26	52	89	13	86	00	47	61
01	70	10	83	94	71	13	67	11	12	36	54	53	32	90	43	79	01	95	15

86	13	84	10	07	30	39	05	97	96	88	07	37	26	04	89	13	48	19	20
60	78	48	12	99	47	09	46	91	33	17	21	03	94	79	00	08	50	40	16
78	48	06	37	82	26	01	06	64	65	94	41	17	26	74	66	61	93	24	97
80	56	90	79	66	94	18	40	97	79	93	20	41	51	25	04	20	71	76	04
99	09	39	25	66	31	70	56	30	15	52	17	87	55	31	11	10	68	98	23
56	32	32	72	91	65	97	36	56	61	12	79	95	17	57	16	53	58	96	36
66	02	49	93	97	44	99	15	56	86	80	57	11	78	40	23	58	40	86	14
31	77	53	94	05	93	56	14	71	23	60	46	05	33	23	72	93	10	81	23
98	79	72	43	14	76	54	77	66	29	84	09	88	56	75	86	41	67	04	42
50	97	92	15	10	01	57	01	87	33	73	17	70	18	40	21	24	20	66	62
90	51	94	50	12	48	88	95	09	34	09	30	22	27	25	56	40	76	01	59
31	99	52	24	13	43	27	88	11	39	41	65	00	84	13	06	31	79	74	97
22	96	23	34	46	12	67	11	48	06	99	24	14	83	78	37	65	73	39	47
06	84	55	41	27	06	74	59	14	29	20	14	45	75	31	16	05	41	22	96
08	64	89	30	25	25	71	35	33	31	04	56	12	67	03	74	07	16	49	32
86	87	62	43	15	11	76	49	79	13	78	80	93	89	09	57	07	14	40	74
94	44	97	13	77	04	35	02	12	76	60	91	93	40	81	06	85	85	72	84
63	25	55	14	66	47	99	90	02	90	83	43	16	01	19	69	11	78	87	16
11	22	83	98	15	21	18	57	53	42	91	91	26	52	89	13	86	00	47	61
01	70	10	83	94	71	13	67	11	12	36	54	53	32	90	43	79	01	95	15