from d2l import tensorflow as d2l
import tensorflow as tfThe same character-level LM, using the framework’s built-in nn.RNN. The cell + unroll + projection from scratch boil down to a few lines:
nn.RNN(input_size, hidden_size) handles the recurrence, including hardware-accelerated cuDNN kernels on GPU.RNNLMScratch head — it doesn’t care whether the cell is hand-rolled.Trainer, same gradient clipping, same data.End result: faster training, ~5× fewer lines of code, identical mathematics.
Built-in RNN cell + handing off the rest of the LM scaffold to the from-scratch base class:
class RNN(d2l.Module):
"""The RNN model implemented with high-level APIs."""
def __init__(self, num_hiddens):
super().__init__()
self.save_hyperparameters()
self.rnn = tf.keras.layers.SimpleRNN(
num_hiddens, return_sequences=True, return_state=True)
def forward(self, inputs, H=None):
# inputs: (time_steps, batch_size, features) -> (batch_size, time_steps, features)
outputs, H = self.rnn(tf.transpose(inputs, perm=[1, 0, 2]), H)
return tf.transpose(outputs, perm=[1, 0, 2]), HUntrained model still runs — predictions are random characters, but shapes line up. This check isolates API wiring from learning quality:
'it hasfaoqbguk<unk>kwvyk<unk>jcqbx'
Same Trainer, with gradient_clip_val=1 on the optimizer:
perplexity 7.3, 'time traveller and the time travel'
Output looks like simple English-shaped text — same character- level statistics the from-scratch version learned, in much less training time.
nn.RNN is the cell + unroll + (with cuDNN) GPU kernels in one stock layer.nn.LSTM, nn.GRU, etc. — drop-in replacements with better long-range gradient behavior.