from d2l import mxnet as d2l
from mxnet import np, npx
from mxnet.gluon import nn
npx.set_np()torch.nn ships 100+ layers, but occasionally — a new architecture, an unusual normalization, a custom block — you need one the framework doesn’t have.
Writing one is trivial: subclass nn.Module, override forward. Two flavors:
forward.Linear, low-rank weight, etc. Wrap learnable tensors in nn.Parameter.The custom layer composes with built-ins automatically — Sequential, parameters(), to(device), checkpointing.
Subtract the row-wise mean from each input. Nothing to learn — pure transform:
Drop the custom layer into a Sequential like any other:
Implement a fully-connected layer from scratch. The one important step: wrap learnable tensors in nn.Parameter so they’re auto-registered for training:
from mxnet import gluon
class MyDense(nn.Block):
def __init__(self, units, in_units):
super().__init__()
self.weight = gluon.Parameter('weight', shape=(in_units, units))
self.bias = gluon.Parameter('bias', shape=(units,))
def forward(self, x):
linear = np.dot(x, self.weight.data(ctx=x.ctx)) + self.bias.data(
ctx=x.ctx)
return npx.relu(linear)nn.Parameter buys youAfter linear = MyLinear(5, 3):
linear.weight and linear.bias are tracked parameters.linear.parameters() yields both — feed to the optimizer.state_dict() saves them; linear.to('cuda') moves them.All for free, just by declaring nn.Parameter in __init__.
Real-world cases that justify a custom layer:
register_buffer for non-trainable tensors that should still travel with the module (saved, moved to GPU, etc.).nn.Module subclass with a forward.forward. Stateful: wrap learnable tensors in nn.Parameter.register_buffer for non-trainable state that should still travel with the module.