from d2l import torch as d2l
import torch
from torch import nn
from torch.nn import functional as Ftorch.nn ships 100+ layers, but occasionally — a new architecture, an unusual normalization, a custom block — you need one the framework doesn’t have.
Writing one is trivial: subclass nn.Module, override forward. Two flavors:
forward.Linear, low-rank weight, etc. Wrap learnable tensors in nn.Parameter.The custom layer composes with built-ins automatically — Sequential, parameters(), to(device), checkpointing.
Subtract the row-wise mean from each input. Nothing to learn — pure transform:
Drop the custom layer into a Sequential like any other:
Implement a fully-connected layer from scratch. The one important step: wrap learnable tensors in nn.Parameter so they’re auto-registered for training:
class MyDense(nn.Module):
def __init__(self, in_units, units):
super().__init__()
# Scaled init (Xavier-ish) keeps activations bounded on size-64 inputs
self.weight = nn.Parameter(torch.randn(in_units, units) / in_units**0.5)
self.bias = nn.Parameter(torch.zeros(units,))
def forward(self, X):
linear = torch.matmul(X, self.weight) + self.bias
return F.relu(linear)nn.Parameter buys youAfter linear = MyLinear(5, 3):
linear.weight and linear.bias are tracked parameters.linear.parameters() yields both — feed to the optimizer.state_dict() saves them; linear.to('cuda') moves them.All for free, just by declaring nn.Parameter in __init__.
tensor([[0.5718, 0.1859, 0.0000],
[0.3187, 0.0000, 0.0000]], grad_fn=<ReluBackward0>)
Real-world cases that justify a custom layer:
register_buffer for non-trainable tensors that should still travel with the module (saved, moved to GPU, etc.).nn.Module subclass with a forward.forward. Stateful: wrap learnable tensors in nn.Parameter.register_buffer for non-trainable state that should still travel with the module.