S3 Chapter 6: Goodness of Fit and Contingency Tables
Introduction: The Challenge of Randomness
Section titled “Introduction: The Challenge of Randomness”The “Honest Dice” Project
Section titled “The “Honest Dice” Project”Imagine you are a board game enthusiast. You come across a Kickstarter project called “The Honest Dice.” The founders claim that through precision engineering and special materials, they have created the fairest dice in history. They assert that the probability of rolling any face is strictly , unlike standard mass-produced dice which inevitably have manufacturing imperfections.
But here is the problem:
- Every physical object has imperfections.
- In a game session, these tiny biases can accumulate over hundreds of rolls.
- How can you scientifically verify if this “Honest Dice” is actually fairer than a cheap plastic one?
- More importantly, how can the founders provide convincing statistical evidence to potential backers?
Beyond Board Games: Digital Randomness
Section titled “Beyond Board Games: Digital Randomness”This problem isn’t just about dice. In the digital world, verifying randomness is even more critical:
- Online Gambling: How do regulators verify that a digital slot machine is fair?
- Lucky Draws: How do we know a promotional lottery isn’t rigged?
- Cryptography: Security relies on random number generators. If a pattern exists, hackers might exploit it.
This chapter introduces the Chi-Square () Tests, a powerful statistical framework to answer these questions by comparing what we see (data) with what we expect (theory).
Goodness of Fit Test: Is the Die Fair?
Section titled “Goodness of Fit Test: Is the Die Fair?”The Core Idea
Section titled “The Core Idea”The Fundamental Question: How large is the discrepancy between our observed data and the theoretical prediction? Is this discrepancy just due to random chance, or does it indicate a systematic bias?
The Logic: If the dice is truly fair, the observed frequency of each face should be “close enough” to the expected frequency. If the difference is “too large,” we suspect the dice is not fair.
Theory and Methodology
Section titled “Theory and Methodology”Setting the Hypotheses
Section titled “Setting the Hypotheses”We start by defining the null hypothesis (), which represents the status quo or the theoretical distribution we are testing against.
- : The data follows the specified distribution (e.g., The die is fair).
- : The data does not follow the specified distribution.
Note: We never “prove” is true. We only check if there is strong evidence to reject it.
Calculating Expected Frequencies
Section titled “Calculating Expected Frequencies”If is true, what should we see? We calculate the Expected Frequency () for each category .
where:
- is the total sample size (total number of trials).
- is the theoretical probability of category under .
The Chi-Square Statistic
Section titled “The Chi-Square Statistic”We need a single number to summarize the total discrepancy between Observed () and Expected () values. We use the Chi-Square statistic:
Degrees of Freedom
Section titled “Degrees of Freedom”The shape of the Chi-Square distribution depends on the degrees of freedom ().
where:
- = Number of categories (bins).
- = Constraint due to fixed sample size (knowing frequencies determines the last one).
- = Number of population parameters estimated from the sample data to calculate expected frequencies.
Example 1: The Uniform Distribution (The Honest Dice)
Section titled “Example 1: The Uniform Distribution (The Honest Dice)”Let’s test the “Honest Dice.” We roll it 600 times.
Observed Data:
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Observed () | 98 | 102 | 95 | 105 | 96 | 104 |
Task: Test, at the 5% level of significance, whether or not a uniform distribution is a suitable model for these data. State your hypotheses and show your working clearly.
Example 2: The Binomial Distribution
Section titled “Example 2: The Binomial Distribution”A basketball player shoots 3 free throws per game. We record his successful shots () over 100 games. We want to test if .
Observed Data:
| (successes) | 0 | 1 | 2 | 3 | Total |
|---|---|---|---|---|---|
| Observed Freq () | 45 | 40 | 13 | 2 | 100 |
Task:
(a) Show that the estimated probability of a successful shot is . (b) Test, at the 5% level of significance, whether or not a binomial distribution is a suitable model for these data. State your hypotheses and show your working clearly.
Example 3: The Normal Distribution
Section titled “Example 3: The Normal Distribution”Testing if continuous data follows a Normal distribution is slightly more complex because:
- The Normal distribution is continuous, but the chi-square test requires discrete categories.
- We typically don’t know the true and , so we must estimate them from the data.
The Solution: Binning
We divide the continuous range into intervals (bins) and count how many observations fall into each bin. This converts continuous data into a frequency table.
Detailed Example: Testing Normality of Exam Scores
Section titled “Detailed Example: Testing Normality of Exam Scores”A teacher suspects that exam scores follow a Normal distribution. She collects 100 scores and groups them into bins:
| Score Range | — | — | — | ||
|---|---|---|---|---|---|
| Observed () | 8 | 22 | 35 | 25 | 10 |
From the raw data (before binning), she calculates:
- Sample mean:
- Sample standard deviation:
Task:
(a) Assuming the scores follow a distribution, show that the expected frequency for the ”—” bin is approximately . (b) Given the expected frequencies for the five bins are roughly , test, at the 5% level of significance, whether or not a normal distribution is a suitable model for these data. State your hypotheses and show your working clearly.
Contingency Tables: Testing for Independence
Section titled “Contingency Tables: Testing for Independence”Introduction: The Mysterious Case of the Titanic
Section titled “Introduction: The Mysterious Case of the Titanic”On April 15, 1912, the RMS Titanic sank after hitting an iceberg. Of the 2,224 passengers and crew, more than 1,500 died. In the aftermath, a troubling question arose:
Was survival related to passenger class?
The “Women and children first” protocol was supposed to apply equally, but rumors suggested that first-class passengers had better access to lifeboats. How can we statistically test whether survival was independent of social class, or whether there was a significant association?
| Survived | Died | Total | |
|---|---|---|---|
| 1st Class | 203 | 122 | 325 |
| 2nd Class | 118 | 167 | 285 |
| 3rd Class | 178 | 528 | 706 |
| Total | 499 | 817 | 1316 |
At first glance, the survival rate for 1st class () seems much higher than for 3rd class (). But could this difference be due to random chance? This is exactly the question that a Chi-Square Test for Independence can answer.
Are Two Variables Related?
Section titled “Are Two Variables Related?”Often we want to know if two categorical variables are related.
- Is gender related to voting preference?
- Is a new drug treatment related to recovery rate?
- Was survival on the Titanic related to passenger class?
Theory: Defining Independence
Section titled “Theory: Defining Independence”Two events and are independent if knowing occurred gives no information about . Mathematically:
The Test Procedure
Section titled “The Test Procedure”Hypotheses:
- : The two variables are independent (no association).
- : The two variables are not independent (there is an association).
Expected Frequencies: If is true, the probability of falling into cell depends only on the row and column totals.
Degrees of Freedom: For a table with rows and columns:
Example: Coffee Preference vs. Time of Day
Section titled “Example: Coffee Preference vs. Time of Day”A cafe wants to know if drink preference depends on the time of day. They survey 200 customers.
| Morning | Afternoon | Evening | Total | |
|---|---|---|---|---|
| Latte | 70 | 25 | 5 | 100 |
| Espresso | 50 | 47 | 3 | 100 |
| Total | 120 | 72 | 8 | 200 |
Task:
- State the hypotheses and .
- Calculate the Expected Frequencies table. Check the Rule of 5! If necessary, pool columns to ensure all expected frequencies are .
- Calculate the statistic.
- Determine the degrees of freedom (based on the new table) and find the critical value at .
- Conclude whether coffee preference is independent of the time of day.
Challenge: Why Does the Chi-Square Statistic Follow a Chi-Square Distribution?
Section titled “Challenge: Why Does the Chi-Square Statistic Follow a Chi-Square Distribution?”This section guides you through understanding why our test statistic follows a Chi-Square distribution. This is a challenging but rewarding exploration!
What is the Chi-Square Distribution?
Section titled “What is the Chi-Square Distribution?”Definition: Chi-Square Distribution If are independent standard normal random variables (), then the sum of their squares: follows a Chi-Square distribution with degrees of freedom, written .
Key Insight: The Chi-Square distribution is fundamentally about sums of squared standard normal variables.
Connecting to Our Statistic
Section titled “Connecting to Our Statistic”Our goal is to show that approximately follows when is true.
Part E: Why Estimating Parameters Costs More Degrees of Freedom
(i) When we estimate parameters from the data, we impose additional constraints (the estimated parameters must “fit” the data in some optimal way). (ii) Each constraint removes one degree of freedom from the -dimensional space. (iii) Final result: . (iv) Example: For testing normality with 5 bins:
- categories
- (estimate and )
Explain in your own words why estimating from and from each “uses up” one degree of freedom. :::
- Each is approximately for large .
- Squaring gives .
- Summing over categories would give , but…
- The constraint introduces a dependency, reducing by 1.
- Estimating parameters further reduces by .
- Final result: under .
This is why we can use Chi-Square tables to find critical values!
Homework Exercises
Section titled “Homework Exercises”Eight tasks were given to each of 125 randomly selected job applicants. The number of tasks failed by each applicant is recorded. The results are as follows.
| Number of tasks failed by an applicant | 0 | 1 | 2 | 3 | 4 | 5 | 6 or more |
|---|---|---|---|---|---|---|---|
| Frequency | 2 | 21 | 45 | 42 | 12 | 3 | 0 |
(a) Show that the probability of a randomly selected task from this sample being failed is .
An employer believes that a binomial distribution might provide a good model for the number of tasks, out of 8, that an applicant fails. He uses a binomial distribution, with the estimated probability of a task being failed. The calculated expected frequencies are as follows.
| Number of tasks failed by an applicant | 0 | 1 | 2 | 3 | 4 | 5 | 6 or more |
|---|---|---|---|---|---|---|---|
| Expected frequency | 7.21 | 24.71 | 37.06 | 17.02 | 5.83 |
(b) Find the value of and the value of giving your answers to 2 decimal places. (c) Test, at the 5% level of significance, whether or not a binomial distribution is a suitable model for these data. State your hypotheses and show your working clearly. (d) The employer believes that all applicants have the same probability of failing each task. Use your result from part (c) to comment on this belief.
A number of males and females were asked to rate their happiness under the headings “not happy”, “fairly happy” and “very happy”. The results are shown in the table below.
| Happiness | |||||
|---|---|---|---|---|---|
| Not happy | Fairly happy | Very happy | Total | ||
| Gender | Female | 9 | 43 | 34 | 86 |
| Male | 13 | 25 | 16 | 54 | |
| Total | 22 | 68 | 50 | 140 | |
Stating your hypotheses, test at the 5% level of significance, whether or not there is evidence of an association between happiness and gender. Show your working clearly.
Appendix: Solutions to Hands-on Examples
Section titled “Appendix: Solutions to Hands-on Examples”Solution to Example 1: The Honest Dice
Section titled “Solution to Example 1: The Honest Dice”1. Hypotheses
- : A uniform distribution is a suitable model for these data ().
- : A uniform distribution is not a suitable model for these data.
2. Expected Frequencies Total . Under , for all faces.
3. Calculate
4. Degrees of Freedom and Critical Value , (probabilities are given by the definition of a fair die). . Critical value () is 11.070.
5. Conclusion . Fail to reject . There is insufficient evidence to suggest the die is unfair; a uniform distribution is a suitable model.
Solution to Example 2: The Binomial Distribution
Section titled “Solution to Example 2: The Binomial Distribution”1. Hypotheses
- : A binomial distribution is a suitable model for these data.
- : A binomial distribution is not a suitable model for these data.
2. Estimate Total shots = 300. Total successes = . .
3. Expected Frequencies (Before Pooling) Using :
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 43.90 | 41.59 | 13.13 | 1.38 |
4. Rule of 5 & Pooling , so we MUST pool and .
| (New Categories) | 0 | 1 | |
|---|---|---|---|
| 45 | 40 | 15 | |
| 43.90 | 41.59 | 14.51 |
5. Calculate
6. Degrees of Freedom (after pooling!), (estimated ). .
7. Conclusion Critical value () is 3.841. . Fail to reject . A binomial distribution is a suitable model.
Solution to Example 3: The Normal Distribution
Section titled “Solution to Example 3: The Normal Distribution”1. Hypotheses
- : A normal distribution is a suitable model for these data.
- : A normal distribution is not a suitable model for these data.
2. Probability for bin 50-60
3. Expected Frequency
4. Calculate
5. Degrees of Freedom bins. (estimated and ). . Critical value () is 5.991.
6. Conclusion . Fail to reject . A normal distribution is a suitable model.
Solution to Contingency Table
Section titled “Solution to Contingency Table”1. Hypotheses : Coffee preference is independent of time of day. : They are not independent.
2. Expected Frequencies (Before Pooling)
| (Expected) | Morning | Afternoon | Evening |
|---|---|---|---|
| Latte | 60 | 36 | 4 |
| Espresso | 60 | 36 | 4 |
3. Rule of 5 & Pooling Since and , we must pool the “Afternoon” and “Evening” columns.
| (Observed) | Morning | Afternoon/Evening |
|---|---|---|
| Latte | 70 | 30 |
| Espresso | 50 | 50 |
| (Expected) | Morning | Afternoon/Evening |
|---|---|---|
| Latte | 60 | 40 |
| Espresso | 60 | 40 |
4. Calculate
5. Degrees of Freedom (using the pooled table!). Critical value () is 3.841.
6. Conclusion . Reject . There is significant evidence of an association between coffee preference and time of day.