Calculus

Calculus for Optimization

Training a neural net = minimizing a loss. Calculus tells us which way to step:

  • The derivative measures the slope — how fast the loss changes when we nudge a parameter.
  • For multi-parameter models, the gradient \nabla_\theta L is the vector of partial derivatives.
  • Optimizers follow -\nabla_\theta L downhill.

The geometric picture: a function and its tangent line at a point. The tangent’s slope is the derivative.

Derivatives, by definition

The derivative of f at x is f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}.

Take a concrete example, u = f(x) = 3x^2 - 4x (analytic derivative: f'(x) = 6x - 4, so f'(1) = 2):

%matplotlib inline
from d2l import jax as d2l
from matplotlib_inline import backend_inline
import numpy as np
def f(x):
    return 3 * x ** 2 - 4 * x

Verifying numerically

At x = 1, the difference quotient \frac{f(x+h) - f(x)}{h} should approach f'(1) = 2 as h \to 0:

for h in 10.0**np.arange(-1, -6, -1):
    print(f'h={h:.5f}, numerical limit={(f(1+h)-f(1))/h:.5f}')
h=0.10000, numerical limit=2.30000
h=0.01000, numerical limit=2.03000
h=0.00100, numerical limit=2.00300
h=0.00010, numerical limit=2.00030
h=0.00001, numerical limit=2.00003

It does — but small h runs into floating-point cancellation, which is exactly the problem autograd sidesteps in the next chapter.

Function and tangent line

Plot u = f(x) alongside its tangent at x=1, y = 2x - 3:

x = np.arange(0, 3, 0.1)
plot(x, [f(x), 2 * x - 3], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])

The tangent’s slope is f'(1) = 2 — derivatives are slopes, made geometric. (The d2l package wraps a few matplotlib helpers — set_figsize, plot, set_axes — used throughout the book. See the source if you’re curious.)

Recap

  • Derivative = limit of the difference quotient.
  • Multi-variable functions have partial derivatives, one per input; the gradient bundles them into a vector.
  • The chain rule lets you differentiate compositions — the engine inside backprop.
  • We won’t compute derivatives by hand for long: the next chapter shows how autograd handles it automatically.