Variance of a Random Variable
Let \(X\) be a random variable. Let \(p_X\) be the probability mass function of \(X\), which means that \(p_X(x) = \mathbb{P}(X = x)\). That is, if we want to know the probability that \(X\) takes value \(x\), we can compute it through \(p_X\).
The expectation of \(X\) is defined as \[ \begin{aligned} \mathbb{E}[X] &= \sum_{x} x p_X(x)\;, \end{aligned} \] where the sum runs across all possible values of \(x\).
The variance of \(X\) is defined as
\[ \begin{aligned} \mathrm{Var}(X) &= \mathbb{E}[(X - \mu)^2]\;, \end{aligned} \]
where \(\mu = \mathbb{E}[X]\). The variance is a measure of how spread for \(X\); if \(X\) always takes on its average value \(\mu\) then \(\mathrm{Var}(X) = 0\), while if \(X\) takes on values far from \(\mu\) then \(\mathrm{Var}(X)\) is large.
Part A
Recall the linearity property of expectation: if \(X,Y\) are random variables and \(a,b\) are constants, then
\[ \begin{aligned} \mathbb{E}[aX + bY] &= a \mathbb{E}[X] + b\mathbb{E}[Y]\;, \end{aligned} \]
Use linearity to write a proof that
\[ \begin{aligned} \mathrm{Var}(X) &= \mathbb{E}[X^2] - \mu^2\;. \end{aligned} \]
Part B
Suppose that I want to predict the value of a random variable \(Y\) with variance \(\mathrm{Var}(Y)\). To do this, I am going to guess a single number \(\hat{y}\) and measure my accuracy in terms of mean-squared error (MSE). My expected MSE is
\[ \begin{aligned} \mathrm{MSE}(\hat{y}) &= \mathbb{E}[(Y - \hat{y})^2]\;. \end{aligned} \]
Determine (a) the optimal value of \(\hat{y}\) (i.e. the value that makes the MSE smallest) and (b) the value of the MSE corresponding to this optimal \(\hat{y}\).
© Phil Chodrow, 2025