Concise Implementation of Recurrent Neural Networks

Concise RNNs

The same character-level LM, using the framework’s built-in nn.RNN. The cell + unroll + projection from scratch boil down to a few lines:

  • nn.RNN(input_size, hidden_size) handles the recurrence, including hardware-accelerated cuDNN kernels on GPU.
  • Reuse the RNNLMScratch head — it doesn’t care whether the cell is hand-rolled.
  • Same Trainer, same gradient clipping, same data.

End result: faster training, ~5× fewer lines of code, identical mathematics.

The model

Built-in RNN cell + handing off the rest of the LM scaffold to the from-scratch base class:

from d2l import tensorflow as d2l
import tensorflow as tf
class RNN(d2l.Module):
    """The RNN model implemented with high-level APIs."""
    def __init__(self, num_hiddens):
        super().__init__()
        self.save_hyperparameters()            
        self.rnn = tf.keras.layers.SimpleRNN(
            num_hiddens, return_sequences=True, return_state=True)
        
    def forward(self, inputs, H=None):
        # inputs: (time_steps, batch_size, features) -> (batch_size, time_steps, features)
        outputs, H = self.rnn(tf.transpose(inputs, perm=[1, 0, 2]), H)
        return tf.transpose(outputs, perm=[1, 0, 2]), H
class RNNLM(d2l.RNNLMScratch):
    """The RNN-based language model implemented with high-level APIs."""
    def init_params(self):
        self.linear = tf.keras.layers.Dense(self.vocab_size)
        
    def output_layer(self, hiddens):
        return d2l.transpose(self.linear(hiddens), (1, 0, 2))

Sanity check

Untrained model still runs — predictions are random characters, but shapes line up. This check isolates API wiring from learning quality:

data = d2l.TimeMachine(batch_size=1024, num_steps=32)
rnn = RNN(num_hiddens=32)
model = RNNLM(rnn, vocab_size=len(data.vocab), lr=1)
model.predict('it has', 20, data.vocab)
'it hasfaoqbguk<unk>kwvyk<unk>jcqbx'

Training and decoding

Same Trainer, with gradient_clip_val=1 on the optimizer:

with d2l.try_gpu():
    trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1)
trainer.fit(model, data)

ppl = float(model.board.data['val_ppl'][-1].y)
pred = model.predict('time traveller', 20, data.vocab)
print(f'perplexity {ppl:.1f}, {pred!r}')
perplexity 7.3, 'time traveller and the time travel'

Output looks like simple English-shaped text — same character- level statistics the from-scratch version learned, in much less training time.

Recap

  • nn.RNN is the cell + unroll + (with cuDNN) GPU kernels in one stock layer.
  • Reuse the from-scratch LM wrapper — only the cell changes.
  • Same scaffold accepts nn.LSTM, nn.GRU, etc. — drop-in replacements with better long-range gradient behavior.
  • The framework version trains noticeably faster than the from-scratch version on the same hardware.