On this page

A First Look at Gradient Descent

In this problem, we’re doing to give a computational introduction to two fundamental ideas of modern computational machine learning:

  1. Minimizing a function is often a useful thing to do and
  2. This can often be done by moving in directions guided by the derivative of the function.

For our first example, we’ll see how minimizing a function by iteratively moving in the direction of the derivatives can be used to solve a familiar math problem.

Let R+ be the set of strictly positive real numbers. Let aR+. Consider the function ga:R+R with formula

Here and elsewhere in this class, log always refers to the natural log (base e), which you may have also seen written ln.

ga(x)=12x2alogx.

Part A

A critical point of ga is a point x0 where dga(x0)dx=0. Find the critical point of this function in the set R+. We’ll call this point xa.

Part B

Use the second derivative test to show that ga is strictly convex on the set R+. It follows that the critical point xa is a global minimum of ga.

Part C

Implement the following algorithm as a Python function:

Inputs: aR+, tolerance ϵ>0, learning rate α, maxsteps=100.

  1. Start with an initial guess x=a.
  2. Set x2a and j0.
  3. While |xx|>ϵ:
    • if j>maxsteps
      • Break
    • xx
    • xxαdga(x)dx;.
    • jj+1.
  4. Return x.

You should use the formula for dga(x)dx that you found in Part A.

Test your function like this:

mystery_fun(a = 9, epsilon = 1e-8, alpha = 0.2)

Please show:

  1. One setting of α for which your function returns a real number very close to the exact value of xa.
  2. One setting of α for which your function fails to return a real number close to the exact value of xa within the maximum number of steps.

Part D

Is it possible to compute the positive square root of a positive real number using only the operations of addition, subtraction, multiplication, and division?



© Phil Chodrow, 2025