from d2l import tensorflow as d2l
import tensorflow as tfA recurrent neural network carries a hidden state \mathbf{h}_t across time steps — a learned summary of all input seen so far:
\mathbf{h}_t = \phi(\mathbf{W}_{xh}\mathbf{x}_t + \mathbf{W}_{hh}\mathbf{h}_{t-1} + \mathbf{b}).
Same weights at every step → constant parameter count regardless of sequence length. Unbounded effective context (in principle), no fixed-size window like n-grams.
An RNN with a hidden state.
The naive form: two matrix multiplies, summed:
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[-2.190528 , -1.6804276, -2.6916413, 0.0785881],
[-1.4864042, 1.7159821, -1.6885962, 0.6511531],
[-1.1071088, -2.559943 , -1.0138488, 0.7972275]], dtype=float32)>
Equivalently — concatenate input and hidden, multiply by the concatenated weight matrix — same result, one matmul:
<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[-2.1905282 , -1.6804273 , -2.6916416 , 0.0785881 ],
[-1.4864042 , 1.7159821 , -1.6885962 , 0.6511531 ],
[-1.1071088 , -2.559943 , -1.0138489 , 0.79722756]],
dtype=float32)>
The “concat then multiply” form is what most framework RNN implementations actually do.
Input “machin”, target “achine” — same RNN, target shifted by one.
The next two sections build this end-to-end (from scratch + concise).