Skip to content

S3 Chapter 1: Sampling Methods

From Guesswork to Science: How We Study Populations

Section titled “From Guesswork to Science: How We Study Populations”

Imagine you’re the manager of a popular bubble tea shop near your school, “HelloTea.” You want to know what students think about your store so you can make it even better.

Let’s begin with the most intuitive idea: what if we could give every student an equal chance of being selected, with no favoritism or bias? This is the foundation of scientific sampling.

Definition: Simple Random Sampling Simple random sampling is a method where every member of the population has an equal and independent chance of being selected. Every possible sample of size nn has the same probability of being chosen.

The Core Principle: Complete randomness eliminates human bias. You’re not choosing students based on convenience, appearance, friendliness, or any other factor that might skew results.

Step 1: Create a Sampling Frame

First, you need a complete list of all students. Let’s say you obtain the school’s student database and assign each student a unique number from 0001 to 3000.

Step 2: Use a Random Number Table

To ensure true randomness, we use a random number table - a pre-generated table of digits arranged with no predictable pattern.

8613841007303905979688073726048913481920
6078481299470946913317210394790008504016
7848063782260106646594411726746661932497
8056907966941840977993204151250420717604
9909392566317056301552178755311110689823
5632327291659736566112799517571653589636
6602499397449915568680571178402358408614
3177539405935614712360460533237293108123
9879724314765477662984098856758641670442
5097921510015701873373177018402124206662
9051945012488895093409302227255640760159
3199522413432788113941650084130631797497
2296233446126711480699241483783765733947
0684554127067459142920144575311605412296
0864893025257135333104561267037407164932
8687624315117649791378809389095707144074
9444971377043502127660919340810685857284
6325551466479990029083431601196911788716
1122839815211857534291912652891386004761
0170108394711367111236545332904379019515

Step 3: Select Your Sample

Let’s walk through the process together:

  1. Choose a starting point randomly: Close your eyes and point to any position in the table. Let’s say you land on Row 3, Column 2.

  2. Decide on a reading pattern: Since our student IDs are 4-digit numbers (0001-3000), we’ll read 4 digits at a time. We’ll read across rows from left to right.

  3. Extract numbers systematically:

    • Starting at Row 3, Column 2: We see “48”. Continue reading: “48”, “06”, “37”
    • Combine into 4-digit groups: 4806, 3782, 2601, 0664, 6594, 4117, 2674…
    • Wait! 4806 is larger than 3000 - we skip it!
    • Continue: 3782 - skip (too large), 2601 - Select student #2601!
  4. Continue until you have 200 unique numbers:

    • Skip any number that appears twice or any invalid numbers, such as numbers larger than 3000 or smaller than 0001.

Discovering Problems: When Simple Random Sampling Falls Short

Section titled “Discovering Problems: When Simple Random Sampling Falls Short”

After implementing your simple random sampling survey, you encounter some real-world challenges:

The Key Realization: While simple random sampling is unbiased and theoretically perfect, it can be inefficient and might miss important subgroups by chance.

Let’s tackle the first problem: the efficiency concern. What if there was a way to maintain randomness while making the process more organized?

The New Plan: Instead of selecting completely random students, you’ll use the student database ordered alphabetically by name. You’ll select every kk-th student from the list.

Definition: Systematic Sampling Systematic sampling is a method where you select every kk-th member from an ordered sampling frame, starting from a randomly chosen position between 1 and kk.

The sampling interval kk is calculated as: k=Population sizeSample sizek = \frac{\text{Population size}}{\text{Sample size}}

Step 1: Calculate the Sampling Interval

k=3000 students200 sample size=15k = \frac{3000 \text{ students}}{200 \text{ sample size}} = 15

This means you’ll select every 15th student from your list.

Step 2: Choose a Random Starting Point

Use your random number table to select a number between 1 and 15. Let’s say you get 12.

Step 3: Select Students Systematically

Starting with student #12, select every 15th student:

  • Student #12 (starting point), Student #27 (12+1512 + 15)
  • Student #42 (27+1527 + 15), Student #57 (42+1542 + 15)
  • … continue until you reach student #2997 (12+199×1512 + 199 \times 15)

You’ll automatically get exactly 200 students!

Advantages and Disadvantage of Systematic Sampling

Section titled “Advantages and Disadvantage of Systematic Sampling”

However, systematic sampling has a critical weakness: periodicity.

Example: The Apartment Building Survey

Scenario: A researcher wants to survey residents in a 20-floor apartment building. Each floor has 15 apartments numbered 01-15. The building database lists apartments in order: 101, 102, …, 115, 201, 202, …, 215, … 2001, …, 2015.

There are 300 total apartments. The researcher wants a sample of 20 apartments.

(a) Calculate the sampling interval kk: __________

(b) If you randomly start with student #7, list the first 5 students you would survey:

(c) List the last 2 students in your sample:

(d) What is the potential problem with this sampling method?

Ensuring Representation: Stratified Sampling

Section titled “Ensuring Representation: Stratified Sampling”

Remember our second problem? By chance, simple random sampling might give us too few heavy customers or miss important subgroups entirely. This is especially problematic when:

  • Different subgroups might have very different opinions
  • Some subgroups are small but critically important
  • You want to compare across subgroups

The Key Insight: What if we could guarantee representation from each important subgroup?

Example: HelloTea: Understanding Your Customer Base

Through preliminary research, you discover students fall into four distinct groups based on tea consumption:

Customer TypeDefinitionCount
Heavy Users\geq 3 visits/week300 students (10%)
Regular Users1-2 visits/week1,200 students (40%)
Light Users1-3 visits/month900 students (30%)
Rare/Non-users<< 1 visit/month600 students (20%)
Total3,000 students (100%)

The Business Question: Heavy users contribute the most revenue - their opinions are crucial! How can you ensure they’re adequately represented in your sample of 200?

Definition: Stratified Random Sampling Stratified random sampling divides the population into non-overlapping groups called strata (plural of stratum) based on specific characteristics. Then, simple random sampling is performed independently within each stratum.

Step 1: Identify Your Strata

Choose characteristics that are:

  • Relevant to your research question
  • Known before sampling (you must be able to classify population members)
  • Creating groups that are internally similar but different from each other

For HelloTea: The four user types (Heavy, Regular, Light, Rare)

Step 2: Determine Sample Sizes for Each Stratum

Option A - Proportional Allocation: Match population proportions

StratumPopulationProportionSample Size
Heavy Users30010%200×0.10=20200 \times 0.10 = 20
Regular Users1,20040%200×0.40=80200 \times 0.40 = 80
Light Users90030%200×0.30=60200 \times 0.30 = 60
Rare/Non-users60020%200×0.20=40200 \times 0.20 = 40
Total3,000100%200

Step 3: Randomly Sample Within Each Stratum

When You Don’t Have a List: Quota Sampling

Section titled “When You Don’t Have a List: Quota Sampling”

Imagine you’re conducting market research for HelloTea, but you encounter a new problem:

Definition: Quota Sampling Quota sampling is a method where you divide the population into groups (like in stratified sampling) and set target numbers (quotas) for each group. However, instead of random selection, you use convenience sampling within each group until quotas are filled.

Key difference from stratified sampling: Selection within groups is non-random.

Step 1: Define Your Quotas

Based on your knowledge of the customer base, you set the same targets as stratified sampling:

Customer TypeQuota
Heavy Users (3\geq 3 visits/week)20
Regular Users (1-2 visits/week)80
Light Users (1-3 visits/month)60
Rare/Non-users (<< 1 visit/month)40
Total200

Step 2: Convenience Sampling Until Quotas Are Met

You station your survey team near HelloTea and the school cafeteria. They approach students and:

  1. Ask a screening question: “How often do you visit HelloTea?”
  2. Based on the answer, classify the student into a group
  3. If that group’s quota isn’t full, conduct the survey
  4. If the quota is full, politely decline and move to the next student
  5. Stop when all quotas are filled

Example conversation:

  • Surveyor: “Excuse me, how often do you visit HelloTea?”
  • Student: “About twice a week.”
  • Surveyor checks: Regular Users quota: 65/80 filled \checkmark
  • Surveyor: “Great! Would you mind answering a few questions about your experience?”

Advantages and Disadvantages of Quota Sampling

Section titled “Advantages and Disadvantages of Quota Sampling”

Putting It All Together: Choosing the Right Method

Section titled “Putting It All Together: Choosing the Right Method”

Now that we’ve explored all four methods through the journey of solving real problems, let’s synthesize what we’ve learned.

MethodRandom Selection?Main AdvantageMain Disadvantage
SimpleYes - completely randomUnbiased, theoretically perfectCan be expensive; might miss subgroups by chance
SystematicPartially - random start, then systematicEasy to execute; ensures spreadVulnerable to periodicity; less random than SRS
StratifiedYes - within stratumGuarantees subgroup representation; more preciseRequires knowing strata beforehand; more complex
QuotaNo - convenience within quotasFast, cheap, no list neededNon-probability; cannot calculate sampling error

Sampling isn’t just technical - it’s ethical:

  • Representation: Do our sampling methods give everyone a fair voice?
  • Bias: Are we systematically excluding certain groups?
  • Transparency: Are we honest about our methods and limitations?
  • Misuse: Could our results be misinterpreted or misused?

Example: If an election poll only surveys voters in a certain district, they’ll miss all the voters in other districts - their rights will not be properly represented.

The sampling methods you’ve learned are foundational to:

  • Statistical inference: Drawing conclusions about populations from samples
  • Confidence intervals: Quantifying uncertainty in estimates
  • Goodness of fit tests: Testing how well a model fits the data
  • Rank correlation tests: Testing whether two variables are related in a monotonic way

Looking ahead: In future chapters, you’ll learn how to analyze the data collected through these sampling methods, test hypotheses, and draw rigorous conclusions with quantified uncertainty.