Ordinary Least-Squares Linear Regression
Regression refers to the problem of predicting a continuous-valued outcome \(y\) from a set of features \(\mathbf{x}\). Much like with classification, we often perform regression by minimizing an empirical risk function:
\[ \begin{aligned} L(\mathbf{w}) = \frac{1}{n} \sum_{i=1}^n \ell(\langle \mathbf{w}, \mathbf{x}_i\rangle, y_i) + \Lambda R(\mathbf{w})\;, \end{aligned} \]
where \(R\) is the regularizer–some function that tries to stop the entries of \(\mathbf{w}\) from growing too large–and \(\Lambda\) is the regularization strength.
In unregularized linear least-squares regression, we make the choice \(\ell(\langle \mathbf{w}, \mathbf{x}_i\rangle, y_i) = \frac{1}{2}(\langle \mathbf{w}, \mathbf{x}_i\rangle - y_i)^2\) and \(R(\mathbf{w}) = 0\). This choice of loss function is called the squared error loss and is the most common choice for regression problems. If there is only a single feature, we can further simplify the loss function:
\[ \begin{aligned} L(w_0, w_1) = \frac{1}{2n} \sum_{i=1}^n (w_0 + w_1 x_i - y_i)^2\;. \end{aligned} \]
Something special about ordinary least-squares regression is that it’s not necessary to use gradient descent in order to compute the optimal values of \(w_0\) and \(w_1\).
Part A
Compute the partial derivatives \(\frac{\partial L}{\partial w_0}\) and \(\frac{\partial L}{\partial w_1}\) of the loss function with respect to \(w_0\) and \(w_1\). Recall that the partial derivative of \(w_0\) is the derivative of \(L\) with respect to \(w_0\) while holding \(w_1\) constant. Don’t forget the chain rule!
Part B
The equations describing the minimizing choice of \(\mathbf{w} = (w_0, w_1)\) are given by setting the partial derivatives of the loss function to zero:
\[ \begin{aligned} \frac{\partial L}{\partial w_0} &= 0 \\ \frac{\partial L}{\partial w_1} &= 0\;. \end{aligned} \]
Please solve these equations and give formulae for the optimal values of \(w_0\) and \(w_1\) in terms of the data features \(x_i\) and the target values \(y_i\).
© Phil Chodrow, 2025