Data Manipulation

Dive into Deep Learning · §1.1

Storing & transforming data with tensors
The n-dimensional arrays that every model in this book is built on.

The tensor: our basic data structure

Motivation

An n-dimensional array of numbers
generalizes the NumPy ndarray.
Runs on GPUs and other accelerators.
Records operations for automatic differentiation.

Rank = number of axes; shape = size per axis.

Getting Started

creating & inspecting tensors

Create a vector, then inspect it

Getting Started

arange(n) builds a 1-D tensor of evenly spaced values:

x = torch.arange(12, dtype=torch.float32)
x

tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

x.shape

torch.Size([12])

numel() → total elements. shape → size along each axis. We ask for float32 because nearly all neural-net math is in floating point.

randn breaks symmetry; lists pin exact values

Getting Started

For weight init, randn draws from \mathcal{N}(0, 1):

torch.randn(3, 4)

tensor([[ 0.0760,  0.9297,  1.0870, -0.4283],
        [ 0.4409, -0.2029, -0.0518,  0.6283],
        [ 0.6729,  0.7885, -1.2366,  1.3750]])

Or type exact values as a list:

torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

Also zeros, ones, full(shape, value), eye(n). Random values break symmetry when initializing network weights; lists let you type a tensor by hand.

Reshape: same data, new layout

Getting Started

Same elements in a new shape; numel is preserved:

X = x.reshape(3, 4)
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

Usually no copy: only the shape metadata changes. Use -1 to infer an axis: x.reshape(3, -1).

Indexing & Slicing

reading & writing elements, rows, ranges

Reading: elements, rows, ranges

Indexing & Slicing

X[-1] is the last row;
X[1:3] is rows 1–2:

X[-1], X[1:3]

(tensor([ 8.,  9., 10., 11.]),
 tensor([[ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]]))

0-based; negatives count from the end; a range a:b is half-open (b excluded).

Writing: one cell or a whole region

Indexing & Slicing

Assignment writes in place.
One element, or a whole slice:

X[1, 2] = 17
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5., 17.,  7.],
        [ 8.,  9., 10., 11.]])

X[:2, :] = 12
X

tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

Operations

elementwise math, joins, comparisons, broadcasting

Elementwise ops: matching shapes, entry by entry

Operations

The operators + - * / ** act elementwise on matching shapes:

x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y

(tensor([ 3.,  4.,  6., 10.]),
 tensor([-1.,  0.,  2.,  6.]),
 tensor([ 2.,  4.,  8., 16.]),
 tensor([0.5000, 1.0000, 2.0000, 4.0000]),
 tensor([ 1.,  4., 16., 64.]))

Unary functions like exp map each element:

torch.exp(x)

tensor([162754.7969, 162754.7969, 162754.7969, 162754.7969, 162754.7969,
        162754.7969, 162754.7969, 162754.7969,   2980.9580,   8103.0840,
         22026.4648,  59874.1406])

Any scalar→scalar map (exp, sin, log) extends to a whole tensor.

Concatenate along an axis

Operations

cat joins along an existing axis
dim=0 adds rows, dim=1 widens:

X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

Every other axis must already match.

Comparisons build masks; reductions collapse

Operations

Comparisons return a boolean tensor.
A ready-made mask:

X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

Reductions collapse axes
no dim= gives a scalar:

X.sum()

tensor(66.)

==, <, > build masks; sum, mean, max collapse axes; add dim= to reduce just one.

Broadcasting stretches size-1 axes for free

Operations · the exception

Size-1 axes are virtually stretched
a 3\times1 plus a 1\times2 gives a 3\times2:

a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

Any axis of size 1 stretches to match the other tensor, without a copy.

Compatible only if each axis is equal or 1.

…or it refuses: no size-1 axis, no guess

Operations · the exception

Line up (3, 2) and (2, 3) from the right, pairing 2 with 3 and 3 with 2: no pair matches, neither member is 1, so the framework raises rather than guessing:

try:
    torch.ones((3, 2)) + torch.ones((2, 3))
except Exception as e:
    print(e)

The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1

Broadcasting aligns shapes from the right; each axis pair must be equal or 1.

Memory & Interop

in-place updates and leaving the tensor world

The hidden cost of `Y = Y + X`

Performance

Every arithmetic expression allocates a new tensor
costly when Y is gigabytes and updated many times per second:

before = id(Y)
Y = Y + X
id(Y) == before

False

id(Y) changed: Y is now bound to a new tensor object.

Saving memory with in-place ops

Performance

Write into pre-allocated storage with
Z[:] = ...; the address holds:

Z = torch.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

id(Z): 129792468200224
id(Z): 129792468200224

If X isn’t needed afterward, X += Y is cheapest:

before = id(X)
X += Y
id(X) == before

True

NumPy round-trip: shared storage

Interop

numpy() / from_numpy() convert cheaply and share memory:

A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

(numpy.ndarray, torch.Tensor)

A size-1 tensor unwraps to a Python scalar:

a = torch.tensor([3.5])
a, a.item(), float(a), int(a)

(tensor([3.5000]), 3.5, 3.5, 3)

Summary

Wrap-up

Tensor = n-d array; the core data structure (GPU + autodiff).
Create: arange, zeros, ones, randn, tensor([…]).
Inspect / restructure: .shape, .numel(), reshape.
Index / slice to read and write: negatives, ranges, regions.

Elementwise math, comparisons (masks), reductions, cat.
Broadcasting stretches size-1 axes and refuses anything else.
Save memory with in-place ops (X[:] = …, +=), or in JAX via jit buffer reuse.
Interop: tensor ↔︎ NumPy, .item() for scalars.

Data Manipulation

The tensor: our basic data structure

Create a vector, then inspect it

randn breaks symmetry; lists pin exact values

Reshape: same data, new layout

Reading: elements, rows, ranges

Writing: one cell or a whole region

Elementwise ops: matching shapes, entry by entry

Concatenate along an axis

Comparisons build masks; reductions collapse

Broadcasting stretches size-1 axes for free

…or it refuses: no size-1 axis, no guess

The hidden cost of Y = Y + X

Saving memory with in-place ops

NumPy round-trip: shared storage

Summary

The hidden cost of `Y = Y + X`