PDFs

Random Variables

Continuous Random Variables

Continuous random variables and their summaries:

PDF p(x) — probability density; P(X \in A) = \int_A p.
Mean \mu = \mathbb{E}[X] = \int x\, p(x)\, dx.
Variance \sigma^2 = \mathbb{E}[(X-\mu)^2] — spread.
Covariance \text{Cov}(X, Y) — joint variation.
Correlation \rho — covariance normalized to [-1, 1].

These are the building blocks of every probabilistic analysis in the book.

From discrete to continuous

Discrete variables assign mass to points. Continuous variables assign density, and probabilities come from integrating over intervals.

%matplotlib inline
from d2l import mxnet as d2l
from IPython import display
from mxnet import np, npx
npx.set_np()

# Plot the probability density function for some random variable
x = np.arange(-5, 5, 0.01)
p = 0.2*np.exp(-(x - 3)**2 / 2)/np.sqrt(2 * np.pi) + \
    0.8*np.exp(-(x + 1)**2 / 2)/np.sqrt(2 * np.pi)

d2l.plot(x, p, 'x', 'Density')

A PDF describes relative likelihood locally; only areas under the curve are probabilities, so the density itself can exceed 1.

# Approximate probability using numerical integration
epsilon = 0.01
x = np.arange(-5, 5, 0.01)
p = 0.2*np.exp(-(x - 3)**2 / 2) / np.sqrt(2 * np.pi) + \
    0.8*np.exp(-(x + 1)**2 / 2) / np.sqrt(2 * np.pi)

d2l.set_figsize()
d2l.plt.plot(x, p, color='black')
d2l.plt.fill_between(x.tolist()[300:800], p.tolist()[300:800])
d2l.plt.show()

f'approximate Probability: {np.sum(epsilon*p[300:800])}'

Standard deviation in code

Standard deviation is the square root of variance; these snippets compute both so the scale of spread stays interpretable.

# Define a helper to plot these figures
def plot_chebyshev(a, p):
    d2l.set_figsize()
    d2l.plt.stem([a-2, a, a+2], [p, 1-2*p, p])
    d2l.plt.xlim([-4, 4])
    d2l.plt.xlabel('x')
    d2l.plt.ylabel('p.m.f.')

    d2l.plt.hlines(0.5, a - 4 * np.sqrt(2 * p),
                   a + 4 * np.sqrt(2 * p), 'black', lw=4)
    d2l.plt.vlines(a - 4 * np.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
    d2l.plt.vlines(a + 4 * np.sqrt(2 * p), 0.53, 0.47, 'black', lw=1)
    d2l.plt.title(f'p = {p:.3f}')

    d2l.plt.show()

# Plot interval when p > 1/8
plot_chebyshev(0.0, 0.2)

# Plot interval when p = 1/8
plot_chebyshev(0.0, 0.125)

# Plot interval when p < 1/8
plot_chebyshev(0.0, 0.05)

Continuous mean and variance

For continuous variables, sums become integrals: weight each value or squared deviation by its density.

# Plot the Cauchy distribution p.d.f.
x = np.arange(-5, 5, 0.01)
p = 1 / (1 + x**2)

d2l.plot(x, p, 'x', 'p.d.f.')

# Plot the integrand needed to compute the variance
x = np.arange(-20, 20, 0.01)
p = x**2 / (1 + x**2)

d2l.plot(x, p, 'x', 'integrand')

Covariance and correlation

Covariance keeps the original units; correlation normalizes by standard deviations so the result lies between -1 and 1.

# Plot a few random variables adjustable covariance
covs = [-0.9, 0.0, 1.2]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
    X = np.random.normal(0, 1, 500)
    Y = covs[i]*X + np.random.normal(0, 1, (500))

    d2l.plt.subplot(1, 4, i+1)
    d2l.plt.scatter(X.asnumpy(), Y.asnumpy())
    d2l.plt.xlabel('X')
    d2l.plt.ylabel('Y')
    d2l.plt.title(f'cov = {covs[i]}')
d2l.plt.show()

# Plot a few random variables adjustable correlations
cors = [-0.9, 0.0, 1.0]
d2l.plt.figure(figsize=(12, 3))
for i in range(3):
    X = np.random.normal(0, 1, 500)
    Y = cors[i] * X + np.sqrt(1 - cors[i]**2) * np.random.normal(0, 1, 500)

    d2l.plt.subplot(1, 4, i + 1)
    d2l.plt.scatter(X.asnumpy(), Y.asnumpy())
    d2l.plt.xlabel('X')
    d2l.plt.ylabel('Y')
    d2l.plt.title(f'cor = {cors[i]}')
d2l.plt.show()

Recap

PDF integrates to 1; the integral over an interval is the probability of falling in it.
\mathbb{E}, \text{Var} are integrals weighted by the PDF.
Correlation = covariance / product of std devs; measures linear relationship strength.
Foundation for cross-entropy, KL divergence, expectations through stochastic gradients, …