import torch
from torch import nnA neural network is a tree of parameters — the weight matrices and bias vectors gradient descent updates. Training is one thing you do with them; this deck covers the others.
A nested module is just a tree. Each module is a node; each parameter is a leaf:
net (Sequential)
├─ 0: Linear ├─ weight (8, 4)
│ └─ bias (8,)
├─ 1: ReLU (no params)
└─ 2: Linear ├─ weight (1, 8)
└─ bias (1,)
Two access patterns:
net[2].weight — direct.Frameworks give you both, plus serialization built on the same traversal.
Index into a Sequential like a list; each layer exposes its parameters as attributes:
OrderedDict([('weight',
tensor([[-0.1232, 0.1454, 0.2363, -0.3100, -0.1172, -0.2252, -0.2725, 0.2280]])),
('bias', tensor([0.3460]))])
Two parameters per Linear layer — weight matrix and bias vector. The output object is a Parameter (PyTorch) or similar wrapper that carries the tensor + gradient + extra metadata.
.data (PyTorch) unwraps the parameter to a plain tensor for inspection:
(torch.nn.parameter.Parameter, tensor([0.3460]))
For everything-at-once, use named_parameters(). It walks the whole tree and yields (name, param) pairs at the leaves — names use dotted paths through the nesting:
[('0.weight', torch.Size([8, 4])),
('0.bias', torch.Size([8])),
('2.weight', torch.Size([1, 8])),
('2.bias', torch.Size([1]))]
This is the iterator optim.SGD(net.parameters(), …) consumes. It’s also what gets pickled when you save a checkpoint with state_dict(). Walk-tree-once, use many ways.
Reuse the same module instance at multiple positions in your architecture, and the framework treats them as one parameter set — same memory, gradients accumulate across uses.
Common cases:
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.LazyLinear(8)
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
shared, nn.ReLU(),
shared, nn.ReLU(),
nn.LazyLinear(1))
net(X)
# Check whether the parameters are the same object (tied, not just equal)
assert net[2].weight is net[4].weight
net[2].weight.data[0, 0] = 100
# Modifying one affects the other since they share the same tensor
assert net[2].weight.data[0, 0] == net[4].weight.data[0, 0]Modify net[2].weight and net[4].weight reflects the same change — they are the same tensor, not just equal.
net[i].weight, .bias, .grad.named_parameters() / state_dict() walks the whole tree.