import torchA tensor is an n-dimensional array of numbers — the fundamental data structure for everything that follows in this book.
ndarray, but GPU-accelerated and differentiable.In this section: how to create, reshape, index, operate on, and share memory with tensors.
A single import wires up the framework’s tensor library:
Two attributes you’ll reach for constantly:
.numel() — the total number of elements.shape — the size along each axis (a tuple)12
torch.Size([12])
reshape rearranges the same elements into a different shape — the total numel is preserved.
tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]])
A 12-element vector becomes a 3\times 4 matrix. No data is copied; only the stride metadata changes.
Constant fills take a shape tuple — any rank, any size:
tensor([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],
[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]])
For weight initialization, randn draws from \mathcal{N}(0, 1) (elements sampled independently):
tensor([[ 0.9909, -0.7055, 1.9036, -0.8750],
[ 0.7395, -0.8414, -0.6979, 1.9743],
[ 0.6293, -0.5748, 0.2322, -0.3098]])
ones, full(shape, value), eye(n), empty (uninitialized, fastest), and *_like(x) round out the family.
For exact control, pass a (nested) list literal — same row-major convention as NumPy:
tensor([[2, 1, 4, 3],
[1, 2, 3, 4],
[4, 3, 2, 1]])
Standard NumPy-style indexing:
X[-1] — the last rowX[1:3] — rows 1 and 2 (3 is exclusive)(tensor([ 8., 9., 10., 11.]),
tensor([[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.]]))
Assignment works the same way:
tensor([[ 0., 1., 2., 3.],
[ 4., 5., 17., 7.],
[ 8., 9., 10., 11.]])
Most common math is applied elementwise — same shape in, same shape out.
tensor([162754.7969, 162754.7969, 162754.7969, 162754.7969, 162754.7969,
162754.7969, 162754.7969, 162754.7969, 2980.9580, 8103.0840,
22026.4648, 59874.1406])
The arithmetic operators are overloaded — +, -, *, /, ** all run elementwise:
(tensor([ 3., 4., 6., 10.]),
tensor([-1., 0., 2., 6.]),
tensor([ 2., 4., 8., 16.]),
tensor([0.5000, 1.0000, 2.0000, 4.0000]),
tensor([ 1., 4., 16., 64.]))
cat glues tensors along an existing axis. Pick the axis with dim:
dim=0 → stack rows (more rows out)dim=1 → stack columns (wider matrix out)(tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 2., 1., 4., 3.],
[ 1., 2., 3., 4.],
[ 4., 3., 2., 1.]]),
tensor([[ 0., 1., 2., 3., 2., 1., 4., 3.],
[ 4., 5., 6., 7., 1., 2., 3., 4.],
[ 8., 9., 10., 11., 4., 3., 2., 1.]]))
Comparison operators broadcast and return a boolean tensor of the same shape — useful for masking entries that satisfy a condition:
tensor([[False, True, False, True],
[False, False, False, False],
[False, False, False, False]])
When tensors of different shapes meet, the smaller one is virtually expanded along missing dimensions — no data copy.
The rule: dimensions of size 1 stretch; everything else must match.
(tensor([[0],
[1],
[2]]),
tensor([[0, 1]]))
Y = Y + XEvery assignment of an arithmetic expression allocates a new tensor. Matters a lot when Y is gigabytes:
False
id(Y) == before is False: Y now points at a brand-new buffer.
Pre-allocate the output and write into it with Z[:] = ...:
id(Z): 138997007585184
id(Z): 138997007585184
Tensors and NumPy ndarrays convert cheaply — most frameworks share storage with NumPy when possible:
(numpy.ndarray, torch.Tensor)
arange / zeros / ones / randn / tensor(list) — create..shape, .numel(), reshape — inspect / reorganize.[i, j], [a:b, c:d] — read and write slices.+ - * / **, cat, ==, sum — element-wise ops, joins, comparisons, reductions..numpy() / .item() — leave the tensor world.