import tensorflow as tfA neural network is a tree of parameters — the weight matrices and bias vectors gradient descent updates. Training is one thing you do with them; this deck covers the others.
A nested module is just a tree. Each module is a node; each parameter is a leaf:
net (Sequential)
├─ 0: Linear ├─ weight (8, 4)
│ └─ bias (8,)
├─ 1: ReLU (no params)
└─ 2: Linear ├─ weight (1, 8)
└─ bias (1,)
Two access patterns:
net[2].weight — direct.Frameworks give you both, plus serialization built on the same traversal.
Index into a Sequential like a list; each layer exposes its parameters as attributes:
[<Variable path=sequential/dense_1/kernel, shape=(8, 1), dtype=float32, value=[[ 0.15129125]
[ 0.3634373 ]
[-0.5235572 ]
[-0.43507627]
[-0.41616136]
[ 0.6064105 ]
[-0.5269911 ]
[ 0.38409138]]>,
<Variable path=sequential/dense_1/bias, shape=(1,), dtype=float32, value=[0.]>]
Two parameters per Linear layer — weight matrix and bias vector. The output object is a Parameter (PyTorch) or similar wrapper that carries the tensor + gradient + extra metadata.
.data (PyTorch) unwraps the parameter to a plain tensor for inspection:
(keras.src.backend.Variable,
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>)
.grad is the gradient buffer — populated by backward(), otherwise None. Useful for custom optimizers or diagnosing dead neurons:
For everything-at-once, use named_parameters(). It walks the whole tree and yields (name, param) pairs at the leaves — names use dotted paths through the nesting:
[array([[ 0.16767919, 0.3248908 , 0.5219191 , 0.20520914, 0.47057313,
-0.1526336 , 0.6161278 , -0.5017144 ],
[ 0.3268463 , 0.52250785, 0.04043335, 0.6704082 , -0.22528923,
0.40379363, -0.63631016, -0.36722666],
[ 0.52791613, -0.19134533, -0.5103644 , 0.40397602, 0.18622148,
0.47954518, 0.20551544, 0.28378856],
...
[-0.43507627],
[-0.41616136],
[ 0.6064105 ],
[-0.5269911 ],
[ 0.38409138]], dtype=float32),
array([0.], dtype=float32)]
This is the iterator optim.SGD(net.parameters(), …) consumes. It’s also what gets pickled when you save a checkpoint with state_dict(). Walk-tree-once, use many ways.
Reuse the same module instance at multiple positions in your architecture, and the framework treats them as one parameter set — same memory, gradients accumulate across uses.
Common cases:
# Keras keeps both references to the shared layer in net.layers,
# but the shared layer's parameters are tied
shared = tf.keras.layers.Dense(8, activation=tf.nn.relu)
net = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(8, activation=tf.nn.relu),
shared,
shared,
tf.keras.layers.Dense(1),
])
net(X)
# Check whether the parameters are the same object
print(net.layers[2].weights[0] is net.layers[3].weights[0])True
Modify net[2].weight and net[4].weight reflects the same change — they are the same tensor, not just equal.
net[i].weight, .bias, .grad.named_parameters() / state_dict() walks the whole tree.