import tensorflow as tfLazy initialization lets you declare a layer’s output size without specifying its input size:
The framework defers allocating the weight tensor until the first forward pass — when it has seen real data and can infer shapes from the upstream output.
In old frameworks: nn.Linear(in_features=20, out_features=256). Now: nn.LazyLinear(256) — less arithmetic, fewer bugs when you change the architecture.
declare layer --> shapes UNKNOWN, no params yet
│
│ nn.LazyLinear(256)
▼
declare model --> same — placeholders
│
│ net = Sequential(...)
▼
forward(X) --> X.shape known → infer first layer
│ first layer output → second layer input
│ ... cascade through the model
▼
parameters allocated, model usable, optimizer can see them
Hand-counting input dims is painful in real architectures:
in_features.Pre-lazy code was full of 16 \cdot 5 \cdot 5 = 400 “compute the flatten size by hand” comments. Lazy init removes that bookkeeping — declare outputs, let inputs come from data.
Inspect the first layer’s weight: it’s a placeholder, not an allocated tensor:
[[], []]
The framework has registered the intent to create a weight, but can’t allocate one until it sees the input shape.
Pass any tensor through. Now the framework knows X.shape == (2, 20) → first layer is Linear(20, 256) → second layer’s input is 256 → second is Linear(256, 10):
[(20, 256), (256,), (256, 10), (10,)]
After this, every layer has concrete weight and bias you can inspect, save, optimize.
The trick combines naturally with custom init: do the forward to materialize, then run your initializer:
This is what d2l.Module.apply_init(...) does behind the scenes. The same pattern works for loading pretrained weights, swapping random init for a curated one, etc.
in_features for every layer in deep / variable-shape architectures.apply_init.optim.SGD(net.parameters()) until parameters exist — pass data once first.