from d2l import mxnet as d2l
from mxnet import np, npx
from mxnet.gluon import nn, rnn
npx.set_np()The same character-level LM, using the framework’s built-in nn.RNN. The cell + unroll + projection from scratch boil down to a few lines:
nn.RNN(input_size, hidden_size) handles the recurrence, including hardware-accelerated cuDNN kernels on GPU.RNNLMScratch head — it doesn’t care whether the cell is hand-rolled.Trainer, same gradient clipping, same data.End result: faster training, ~5× fewer lines of code, identical mathematics.
Built-in RNN cell + handing off the rest of the LM scaffold to the from-scratch base class:
class RNN(d2l.Module):
"""The RNN model implemented with high-level APIs."""
def __init__(self, num_hiddens):
super().__init__()
self.save_hyperparameters()
self.rnn = rnn.RNN(num_hiddens)
def forward(self, inputs, H=None):
if H is None:
H, = self.rnn.begin_state(inputs.shape[1], ctx=inputs.ctx)
outputs, (H, ) = self.rnn(inputs, (H, ))
return outputs, HUntrained model still runs — predictions are random characters, but shapes line up. This check isolates API wiring from learning quality:
Same Trainer, with gradient_clip_val=1 on the optimizer:
nn.RNN is the cell + unroll + (with cuDNN) GPU kernels in one stock layer.nn.LSTM, nn.GRU, etc. — drop-in replacements with better long-range gradient behavior.