Concise Implementation of Recurrent Neural Networks

Concise RNNs

The same character-level LM, using the framework’s built-in nn.RNN. The cell + unroll + projection from scratch boil down to a few lines:

  • nn.RNN(input_size, hidden_size) handles the recurrence, including hardware-accelerated cuDNN kernels on GPU.
  • Reuse the RNNLMScratch head — it doesn’t care whether the cell is hand-rolled.
  • Same Trainer, same gradient clipping, same data.

End result: faster training, ~5× fewer lines of code, identical mathematics.

The model

Built-in RNN cell + handing off the rest of the LM scaffold to the from-scratch base class:

from d2l import torch as d2l
import torch
from torch import nn
from torch.nn import functional as F
class RNN(d2l.Module):
    """The RNN model implemented with high-level APIs."""
    def __init__(self, num_inputs, num_hiddens):
        super().__init__()
        self.save_hyperparameters()
        self.rnn = nn.RNN(num_inputs, num_hiddens)
        
    def forward(self, inputs, H=None):
        return self.rnn(inputs, H)
class RNNLM(d2l.RNNLMScratch):
    """The RNN-based language model implemented with high-level APIs."""
    def init_params(self):
        self.linear = nn.LazyLinear(self.vocab_size)
        
    def output_layer(self, hiddens):
        return d2l.swapaxes(self.linear(hiddens), 0, 1)

Sanity check

Untrained model still runs — predictions are random characters, but shapes line up. This check isolates API wiring from learning quality:

data = d2l.TimeMachine(batch_size=1024, num_steps=32)
rnn = RNN(num_inputs=len(data.vocab), num_hiddens=32)
model = RNNLM(rnn, vocab_size=len(data.vocab), lr=1)
model.predict('it has', 20, data.vocab)
'it hasoooooooooooooooooooo'

Training and decoding

Same Trainer, with gradient_clip_val=1 on the optimizer:

trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1, num_gpus=1)
trainer.fit(model, data)

ppl = float(model.board.data['val_ppl'][-1].y)
pred = model.predict('time traveller', 20, data.vocab, d2l.try_gpu())
print(f'perplexity {ppl:.1f}, {pred!r}')
perplexity 7.4, 'time traveller and the time time t'

Output looks like simple English-shaped text — same character- level statistics the from-scratch version learned, in much less training time.

Recap

  • nn.RNN is the cell + unroll + (with cuDNN) GPU kernels in one stock layer.
  • Reuse the from-scratch LM wrapper — only the cell changes.
  • Same scaffold accepts nn.LSTM, nn.GRU, etc. — drop-in replacements with better long-range gradient behavior.
  • The framework version trains noticeably faster than the from-scratch version on the same hardware.