from d2l import torch as d2l
import torch
from torch import nn
from torch.nn import functional as FThe same character-level LM, using the framework’s built-in nn.RNN. The cell + unroll + projection from scratch boil down to a few lines:
nn.RNN(input_size, hidden_size) handles the recurrence, including hardware-accelerated cuDNN kernels on GPU.RNNLMScratch head — it doesn’t care whether the cell is hand-rolled.Trainer, same gradient clipping, same data.End result: faster training, ~5× fewer lines of code, identical mathematics.
Built-in RNN cell + handing off the rest of the LM scaffold to the from-scratch base class:
Untrained model still runs — predictions are random characters, but shapes line up. This check isolates API wiring from learning quality:
'it hasoooooooooooooooooooo'
Same Trainer, with gradient_clip_val=1 on the optimizer:
perplexity 7.4, 'time traveller and the time time t'
Output looks like simple English-shaped text — same character- level statistics the from-scratch version learned, in much less training time.
nn.RNN is the cell + unroll + (with cuDNN) GPU kernels in one stock layer.nn.LSTM, nn.GRU, etc. — drop-in replacements with better long-range gradient behavior.