Empirical bias / variance

Statistics

Estimator Quality

A primer on the language of estimators that ML borrows heavily from:

Estimator — a procedure that takes data and outputs a guess (e.g. sample mean, MLE).
Bias — \mathbb{E}[\hat\theta] - \theta. Systematic error.
Variance — \text{Var}(\hat\theta). Noise across datasets.
MSE = bias^2 + variance — the basic decomposition that explains overfitting and why regularization helps.

This deck makes the bias-variance tradeoff concrete.

Evaluating estimators

An estimator is judged by its sampling distribution: repeat the same experiment on fresh datasets and ask where the estimates center and how widely they vary.

from d2l import mxnet as d2l
import mxnet as mx
from mxnet import np, npx
import random
npx.set_np()

# Sample datapoints and create y coordinate
epsilon = 0.1
# `random.seed` only seeds the stdlib RNG, not MXNet's; seed both so the
# demo is reproducible.
random.seed(8675309)
mx.random.seed(8675309)
xs = np.random.normal(loc=0, scale=1, size=(300,))

ys = [np.sum(np.exp(-(xs[:i] - xs[i])**2 / (2 * epsilon**2))
             / np.sqrt(2*np.pi*epsilon**2)) / len(xs) for i in range(len(xs))]

# Compute true density
xd = np.arange(np.min(xs).item(), np.max(xs).item(), 0.01)
yd = np.exp(-xd**2/2) / np.sqrt(2 * np.pi)

# Plot the results
d2l.plot(xd, yd, 'x', 'density')
d2l.plt.scatter(xs, ys)
d2l.plt.axvline(x=0)
d2l.plt.axvline(x=np.mean(xs), linestyle='--', color='purple')
d2l.plt.title(f'sample mean: {float(np.mean(xs)):.2f}')
d2l.plt.show()

Simulate a sampling distribution: many datasets → many estimates → empirical mean and spread:

# Statistical bias
def stat_bias(true_theta, est_theta):
    return(np.mean(est_theta) - true_theta)

# Mean squared error
def mse(data, true_theta):
    return(np.mean(np.square(data - true_theta)))

theta_true = 1
sigma = 4
sample_len = 10000
samples = np.random.normal(theta_true, sigma, sample_len)
theta_est = np.mean(samples)
theta_est

Empirical bias / variance (cont.)

The second pass turns simulated estimates into empirical bias, variance, and MSE, making the bias-variance decomposition visible.

mse(samples, theta_true)

bias = stat_bias(theta_true, theta_est)
np.square(samples.std()) + np.square(bias)

A Gaussian example

Sample mean for \mathcal{N}(\mu, \sigma^2): unbiased, variance \sigma^2/n. Concretely visualize this:

# Number of samples
N = 1000

# Sample dataset
samples = np.random.normal(loc=0, scale=1, size=(N,))

# Lookup Students's t-distribution c.d.f.
t_star = 1.96

# Construct interval
mu_hat = np.mean(samples)
sigma_hat = samples.std(ddof=1)
(mu_hat - t_star*sigma_hat/np.sqrt(N), mu_hat + t_star*sigma_hat/np.sqrt(N))

Recap

Estimator quality = bias + variance.
Sample mean is BLUE for \mu — best linear unbiased estimator under iid Gaussian noise.
Regularization trades a bit of bias for a lot of variance reduction.
Same trade-off shows up everywhere: dropout, weight decay, ensembling.