Core Motivation
In real decisions, we almost never use one raw measurement alone. We use totals, differences, weighted averages, and combined scores:
total weight of multiple products in quality control,
difference between morning and evening blood pressure,
average test score from several sections,
total delivery time for multi-stage logistics.
All these require combinations of random variables . If students mix up expectation rules, variance rules, or independence conditions, all later inference becomes fragile.
Roadmap
How to use this handout:
Section 1: Normal distribution basics and intuition.
Section 2: Expectation and linear combinations.
Section 3: Variance and the independence checkpoint.
Section 4: Combination of normal distributions.
Section 5: Class practice in past-paper style.
Section 6: Three transferred exercises from the last homework set.
Definition: Normal Distribution
A random variable X X X is normally distributed with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 , written
X ∼ N ( μ , σ 2 ) . X \sim N(\mu,\sigma^2). X ∼ N ( μ , σ 2 ) .
μ \mu μ : center/location (expected value),
σ \sigma σ : spread/typical deviation,
σ 2 \sigma^2 σ 2 : variance.
Theorem: Linearity of Expectation
For any random variables X 1 , … , X n X_1,\ldots,X_n X 1 , … , X n and constants a 1 , … , a n , b a_1,\ldots,a_n,b a 1 , … , a n , b ,
E ( ∑ i = 1 n a i X i + b ) = ∑ i = 1 n a i E ( X i ) + b . E\!\left(\sum_{i=1}^{n} a_iX_i + b\right)=\sum_{i=1}^{n}a_iE(X_i)+b. E ( ∑ i = 1 n a i X i + b ) = ∑ i = 1 n a i E ( X i ) + b .
No independence assumption is needed.
Theorem: Variance Rules Under Independence
If X X X and Y Y Y are independent, then
Var ( X + Y ) = Var ( X ) + Var ( Y ) . \text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y). Var ( X + Y ) = Var ( X ) + Var ( Y ) .
For independent X 1 , … , X n X_1,\ldots,X_n X 1 , … , X n ,
Var ( ∑ i = 1 n a i X i ) = ∑ i = 1 n a i 2 Var ( X i ) . \text{Var}\!\left(\sum_{i=1}^{n}a_iX_i\right)=\sum_{i=1}^{n}a_i^2\text{Var}(X_i). Var ( ∑ i = 1 n a i X i ) = ∑ i = 1 n a i 2 Var ( X i ) .
Theorem: Linear Combination of Independent Normals
If X 1 , … , X n X_1,\ldots,X_n X 1 , … , X n are independent and
X i ∼ N ( μ i , σ i 2 ) , X_i\sim N(\mu_i,\sigma_i^2), X i ∼ N ( μ i , σ i 2 ) ,
then for constants a 1 , … , a n a_1,\ldots,a_n a 1 , … , a n ,
L = ∑ i = 1 n a i X i L=\sum_{i=1}^{n}a_iX_i L = ∑ i = 1 n a i X i
is normally distributed with
E ( L ) = ∑ i = 1 n a i μ i , Var ( L ) = ∑ i = 1 n a i 2 σ i 2 . E(L)=\sum_{i=1}^{n}a_i\mu_i,\quad \text{Var}(L)=\sum_{i=1}^{n}a_i^2\sigma_i^2. E ( L ) = ∑ i = 1 n a i μ i , Var ( L ) = ∑ i = 1 n a i 2 σ i 2 .
Read and interpret X ∼ N ( μ , σ 2 ) X \sim N(\mu,\sigma^2) X ∼ N ( μ , σ 2 ) .
Connect mean and variance to realistic contexts.
Standardize to the Z Z Z -distribution for probability calculation.
Real-World Motivation
Many quantities are affected by many small additive factors:
product weight = machine setting + material fluctuation + moisture + measurement noise;
commute time = base travel time + traffic light delays + boarding delay + random crowd effects;
blood pressure reading = physiological level + temporary stress + instrument noise.
When many small effects add up, a bell-shape is often a useful approximation.
If X ∼ N ( μ , σ 2 ) X \sim N(\mu,\sigma^2) X ∼ N ( μ , σ 2 ) , define
Z = X − μ σ ∼ N ( 0 , 1 ) . Z=\frac{X-\mu}{\sigma}\sim N(0,1). Z = σ X − μ ∼ N ( 0 , 1 ) .
Then
P ( X ≤ x ) = P ( Z ≤ x − μ σ ) . P(X\le x)=P\!\left(Z\le \frac{x-\mu}{\sigma}\right). P ( X ≤ x ) = P ( Z ≤ σ x − μ ) .
Example: Battery Weight
Suppose battery weight W ∼ N ( 75 , 3 2 ) W \sim N(75,\,3^2) W ∼ N ( 75 , 3 2 ) grams.
P ( W > 80 ) = P ( Z > 80 − 75 3 ) = P ( Z > 1.667 ) . P(W>80) = P\!\left(Z>\frac{80-75}{3}\right) = P(Z>1.667). P ( W > 80 ) = P ( Z > 3 80 − 75 ) = P ( Z > 1.667 ) .
So the problem is converted to standard normal table use.
Key Distinction
E ( X + Y ) = E ( X ) + E ( Y ) always true E(X+Y)=E(X)+E(Y) \quad \text{always true} E ( X + Y ) = E ( X ) + E ( Y ) always true
Var ( X + Y ) = Var ( X ) + Var ( Y ) not always true \text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y) \quad \text{not always true} Var ( X + Y ) = Var ( X ) + Var ( Y ) not always true
This is one of the most important contrasts in the chapter.
Example: Hedging Trade: Meituan and Alibaba Expected Return
Let
M M M : one-day return (%) of Meituan stock
A A A : one-day return (%) of Alibaba stock
A trader builds a hedge portfolio:
P = M − 0.7 A P=M-0.7A P = M − 0.7 A
(long Meituan, short 0.7 0.7 0.7 units of Alibaba).
Suppose
E ( M ) = 0.40 , E ( A ) = 0.25. E(M)=0.40,\quad E(A)=0.25. E ( M ) = 0.40 , E ( A ) = 0.25.
Then
E ( P ) = E ( M ) − 0.7 E ( A ) = 0.40 − 0.7 ( 0.25 ) = 0.225. E(P)=E(M)-0.7E(A)=0.40-0.7(0.25)=0.225. E ( P ) = E ( M ) − 0.7 E ( A ) = 0.40 − 0.7 ( 0.25 ) = 0.225.
So the expected one-day return of the hedged portfolio is 0.225 % 0.225\% 0.225% .
Example: Hedged Portfolio Variance: Independence Given
Continue with
P = M − 0.7 A , P=M-0.7A, P = M − 0.7 A ,
and assume
Var ( M ) = 2.25 , Var ( A ) = 1.44. \text{Var}(M)=2.25,\quad \text{Var}(A)=1.44. Var ( M ) = 2.25 , Var ( A ) = 1.44.
If the question states that M M M and A A A are independent, then
Var ( P ) = Var ( M ) + ( − 0.7 ) 2 Var ( A ) = 2.25 + 0.49 ( 1.44 ) = 2.9556. \text{Var}(P)=\text{Var}(M)+(-0.7)^2\text{Var}(A)=2.25+0.49(1.44)=2.9556. Var ( P ) = Var ( M ) + ( − 0.7 ) 2 Var ( A ) = 2.25 + 0.49 ( 1.44 ) = 2.9556.
Therefore
SD ( P ) = 2.9556 ≈ 1.72. \text{SD}(P)=\sqrt{2.9556}\approx 1.72. SD ( P ) = 2.9556 ≈ 1.72.
Key point: coefficients must be squared in variance calculations.
Example: Hedged Portfolio Variance: Independence Not Given
If the question only gives
Var ( M ) = 2.25 , Var ( A ) = 1.44 , \text{Var}(M)=2.25,\quad \text{Var}(A)=1.44, Var ( M ) = 2.25 , Var ( A ) = 1.44 ,
but does not state whether Meituan and Alibaba returns are independent, then for
P = M − 0.7 A P=M-0.7A P = M − 0.7 A
you cannot directly write
Var ( P ) = 2.25 + 0.49 ( 1.44 ) . \text{Var}(P)=2.25+0.49(1.44). Var ( P ) = 2.25 + 0.49 ( 1.44 ) .
If the returns are negatively correlated, then the variance of P P P becomes lower which reduces the risk of the hedge portfolio.
Example: Difference of Blood Pressure Readings
Morning and evening blood pressures:
M ∼ N ( 120 , 25 ) , E ∼ N ( 115 , 36 ) , M\sim N(120,25),\quad E\sim N(115,36), M ∼ N ( 120 , 25 ) , E ∼ N ( 115 , 36 ) ,
with independence. Define D = M − E D=M-E D = M − E .
Then
E ( D ) = 120 − 115 = 5 , Var ( D ) = 25 + 36 = 61. E(D)=120-115=5,\quad \text{Var}(D)=25+36=61. E ( D ) = 120 − 115 = 5 , Var ( D ) = 25 + 36 = 61.
So
D ∼ N ( 5 , 61 ) . D\sim N(5,61). D ∼ N ( 5 , 61 ) .
Example: Weighted Combination
Suppose X ∼ N ( 60 , 4 ) X\sim N(60,4) X ∼ N ( 60 , 4 ) , Y ∼ N ( 45 , 9 ) Y\sim N(45,9) Y ∼ N ( 45 , 9 ) , independent.
Let
T = 2 X − 3 Y . T=2X-3Y. T = 2 X − 3 Y .
Then
E ( T ) = 2 ( 60 ) − 3 ( 45 ) = − 15 , E(T)=2(60)-3(45)=-15, E ( T ) = 2 ( 60 ) − 3 ( 45 ) = − 15 ,
Var ( T ) = 2 2 ( 4 ) + ( − 3 ) 2 ( 9 ) = 16 + 81 = 97. \text{Var}(T)=2^2(4)+(-3)^2(9)=16+81=97. Var ( T ) = 2 2 ( 4 ) + ( − 3 ) 2 ( 9 ) = 16 + 81 = 97.
Hence
T ∼ N ( − 15 , 97 ) . T\sim N(-15,97). T ∼ N ( − 15 , 97 ) .
If X 1 , … , X n X_1,\ldots,X_n X 1 , … , X n are independent N ( μ , σ 2 ) N(\mu,\sigma^2) N ( μ , σ 2 ) , then
X ˉ = 1 n ∑ i = 1 n X i \bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i X ˉ = n 1 ∑ i = 1 n X i
is also normal, and
X ˉ ∼ N ( μ , σ 2 n ) . \bar{X}\sim N\!\left(\mu,\frac{\sigma^2}{n}\right). X ˉ ∼ N ( μ , n σ 2 ) .
This single result powers much of later confidence interval work.
Exercise: Class Practice
Random variables X 1 , … , X 40 X_1,\ldots,X_{40} X 1 , … , X 40 are independent with X i ∼ N ( 120 , 25 ) X_i\sim N(120,25) X i ∼ N ( 120 , 25 ) . Let X ˉ \bar{X} X ˉ be the sample mean.
(a) Find the distribution of X ˉ \bar{X} X ˉ .
(b) Find P ( X ˉ > 122 ) P(\bar{X}>122) P ( X ˉ > 122 ) .
(c) Explain in one or two sentences why X ˉ \bar{X} X ˉ is much less variable than a single X i X_i X i .
Exercise: WST03/01/June20/7
(a) A company makes cricket balls and tennis balls. The weights of cricket balls, C C C grams, follow a normal distribution
C ∼ N ( 160 , 1.25 2 ) C \sim N(160, 1.25^2) C ∼ N ( 160 , 1.2 5 2 )
Three cricket balls are selected at random. Find the probability that their total weight is more than 475.8 grams. (4)
(b) The weights of tennis balls, T T T grams, follow a normal distribution
T ∼ N ( 60 , 2 2 ) T \sim N(60, 2^2) T ∼ N ( 60 , 2 2 )
Five tennis balls and two cricket balls are selected at random. Find the probability that the total weight of the five tennis balls and the two cricket balls is more than 625 grams. (4)
(c) A random sample of n n n tennis balls T 1 , T 2 , T 3 , … , T n T_1, T_2, T_3, \ldots, T_n T 1 , T 2 , T 3 , … , T n is taken. The random variable
Y = ( n − 1 ) T 1 − ∑ i = 2 n T i Y = (n - 1)T_1 - \sum_{i=2}^n T_i Y = ( n − 1 ) T 1 − ∑ i = 2 n T i
Given that
P ( Y > 40 ) = 0.0838 \text{P}(Y > 40) = 0.0838 P ( Y > 40 ) = 0.0838
correct to 4 decimal places, find n n n . (8)
Exercise: 6691/01/May19/7
(a) Two independent random samples X 1 , X 2 , X 3 , X 4 X_1, X_2, X_3, X_4 X 1 , X 2 , X 3 , X 4 and Y 1 , Y 2 , Y 3 , Y 4 Y_1, Y_2, Y_3, Y_4 Y 1 , Y 2 , Y 3 , Y 4 are each taken from a normal population with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 . Find the distribution of the random variable M = 4 X 1 − 3 X 2 − Y ‾ M = 4X_1 - 3X_2 - \overline{Y} M = 4 X 1 − 3 X 2 − Y . (4)
(b) Hence show that P ( 4 X 1 < 3 X 2 + Y ‾ + σ ) = 0.579 P(4X_1 < 3X_2 + \overline{Y} + \sigma) = 0.579 P ( 4 X 1 < 3 X 2 + Y + σ ) = 0.579 to 3 significant figures. (3)
(c) A random sample W 1 , W 2 , W 3 , W 4 W_1, W_2, W_3, W_4 W 1 , W 2 , W 3 , W 4 is also taken from a normal population with mean μ \mu μ and variance σ 2 \sigma^2 σ 2 . Find the distribution of the random variable T = 4 W 1 − 3 W 2 − W ‾ T = 4W_1 - 3W_2 - \overline{W} T = 4 W 1 − 3 W 2 − W . (5)
Exercise: WST03/01/Jan21/6
(a) A potter makes decorative tiles in two colours, red and yellow. The length, R R R cm, of the red tiles has a normal distribution with mean 15 cm and standard deviation 1.5 cm. The length, Y Y Y cm, of the yellow tiles has the normal distribution N ( 12 , 0.8 2 ) N(12, 0.8^2) N ( 12 , 0. 8 2 ) . The random variables R R R and Y Y Y are independent. A red tile and a yellow tile are chosen at random. Find the probability that the yellow tile is longer than the red tile. (4)
(b) Taruni buys 3 red tiles and 1 yellow tile. Find the probability that the total length of the 3 red tiles is less than 4 times the length of the yellow tile. (7)
(c) Stefan defines the random variable X = a R + b Y X = aR + bY X = a R + bY , where a a a and b b b are constants. He wants to use values of a a a and b b b such that X X X has a mean of 780 and minimum variance. Find the value of a a a and the value of b b b that Stefan should use. (7)