Data = Signal + Noise
Reading
There’s no designated reading today, but you will likely find it helpful to review some differential calculus to complete the warmup problem below.
Warmup Problem
An important task in machine learning is optimization finding the values of an input that maximize or minimize some output function. It turns out that in many machine learning applications, it’s easier to optimize the logarithm of a function (i.e. \(\log f(x)\)) rather than the function itself (i.e. \(f(x)\)). In the following problem we’ll check that this gives us the same answers. We’ll also review and practice our skills with differential calculus.
Recall that, if \(f:\mathbb{R}\rightarrow \mathbb{R}\) is a differentiable function, then \(x^*\) is a critical point of \(f\) if \(\frac{df}{dx}(x^*) = f'(x^*) = 0\). When finding critical points in machine learning problems, it’s often easiest to work with the logarithm of a function rather than the function itself. In this problem, we’ll mathematically check that this still gives us the same answers.
Part A
Theorem 1 (log Preserves Critical Points) Let \(f:\mathbb{R}\rightarrow \mathbb{R}\) be a differentiable function such that \(f(x) > 0\) for all \(x \in \mathbb{R}\). Then, \(x^*\) is a critical point of \(f\) if and only if \(x^*\) is also a critical point of the function \(h(x) = \log f(x)\).
(You may assume that the \(\log\) is base \(e\), which is sometimes also written \(\ln f(x)\).)
Please prove Theorem 1. It’s sufficient to calculate \(\frac{dh}{dx}\) in terms of \(\frac{df}{dx}\) (use the chain rule!). Can it be true that one of \(\frac{df}{dx}(x^*)\) or \(\frac{dh}{dx}(x^*)\) is zero while the other is non-zero?
Part B
The second derivative test says that if \(x^*\) is a critical point of a twice-differentiable function \(f:\mathbb{R}\rightarrow \mathbb{R}\), then \(x^*\) is a local minimum (rather than a maximum or inflection point) of \(f\) if \(\frac{d^2 f}{dx^2}(x^*) > 0\).
Prove that if \(x^*\) is a critical point of \(f\) and also of \(h(x) = \log f(x)\), then \(x^*\) is a local minimum of \(f\) if and only if it is a local minimum of \(h\).
Hint: apply the second-derivative test to \(h\) by calculating \(\frac{d^2 h}{dx^2}\) in terms of \(f\) and its derivatives. Is it possible for \(\frac{d^2 f}{dx^2}(x^*)\) and \(\frac{d^2 h}{dx^2}(x^*)\) to have different signs?