Data Manipulation

Dive into Deep Learning · §1.1

Storing & transforming data with tensors
The n-dimensional arrays that every model in this book is built on.

The tensor: our basic data structure

Motivation

An n-dimensional array of numbers
generalizes the NumPy ndarray.
Runs on GPUs and other accelerators.
Records operations for automatic differentiation.

Rank = number of axes; shape = size per axis.

Getting Started

creating & inspecting tensors

Create a vector, then inspect it

Getting Started

arange(n) builds a 1-D tensor of evenly spaced values:

x = jnp.arange(12, dtype=jnp.float32)
x

Array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.],      dtype=float32)

x.shape

(12,)

numel() → total elements. shape → size along each axis. We ask for float32 because nearly all neural-net math is in floating point.

randn breaks symmetry; lists pin exact values

Getting Started

For weight init, randn draws from \mathcal{N}(0, 1):

# Any call of a random function in JAX requires a key to be
# specified, feeding the same key to a random function will
# always result in the same sample being generated
jax.random.normal(jax.random.key(0), (3, 4))

Array([[ 1.6226422 ,  2.0252647 , -0.43359444, -0.07861735],
       [ 0.1760909 , -0.97208923, -0.49529874,  0.4943786 ],
       [ 0.6643493 , -0.9501635 ,  2.1795304 , -1.9551506 ]],      dtype=float32)

Or type exact values as a list:

jnp.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

Array([[2, 1, 4, 3],
       [1, 2, 3, 4],
       [4, 3, 2, 1]], dtype=int32)

Also zeros, ones, full(shape, value), eye(n). Random values break symmetry when initializing network weights; lists let you type a tensor by hand.

Reshape: same data, new layout

Getting Started

Same elements in a new shape; numel is preserved:

X = x.reshape(3, 4)
X

Array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]], dtype=float32)

Usually no copy: only the shape metadata changes. Use -1 to infer an axis: x.reshape(3, -1).

Indexing & Slicing

reading & writing elements, rows, ranges

Reading: elements, rows, ranges

Indexing & Slicing

X[-1] is the last row;
X[1:3] is rows 1–2:

X[-1], X[1:3]

(Array([ 8.,  9., 10., 11.], dtype=float32),
 Array([[ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]], dtype=float32))

0-based; negatives count from the end; a range a:b is half-open (b excluded).

Writing: arrays are immutable

Indexing & Slicing

JAX arrays can’t be mutated.
.at[i].set(v) returns a new array:

# JAX arrays are immutable. jax.numpy.ndarray.at index
# update operators create a new array with the corresponding
# modifications made
X_new_1 = X.at[1, 2].set(17)
X_new_1

Array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5., 17.,  7.],
       [ 8.,  9., 10., 11.]], dtype=float32)

Writing: updates chain

Indexing & Slicing

Each .at[...].set(...) returns a new array, so updates chain:

X_new_2 = X_new_1.at[:2, :].set(12)
X_new_2

Array([[12., 12., 12., 12.],
       [12., 12., 12., 12.],
       [ 8.,  9., 10., 11.]], dtype=float32)

Operations

elementwise math, joins, comparisons, broadcasting

Elementwise ops: matching shapes, entry by entry

Operations

The operators + - * / ** act elementwise on matching shapes:

x = jnp.array([1.0, 2, 4, 8])
y = jnp.array([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y

(Array([ 3.,  4.,  6., 10.], dtype=float32),
 Array([-1.,  0.,  2.,  6.], dtype=float32),
 Array([ 2.,  4.,  8., 16.], dtype=float32),
 Array([0.5, 1. , 2. , 4. ], dtype=float32),
 Array([ 1.,  4., 16., 64.], dtype=float32))

Unary functions like exp map each element:

jnp.exp(x)

Array([1.0000000e+00, 2.7182817e+00, 7.3890562e+00, 2.0085537e+01,
       5.4598148e+01, 1.4841316e+02, 4.0342880e+02, 1.0966332e+03,
       2.9809580e+03, 8.1030840e+03, 2.2026467e+04, 5.9874145e+04],      dtype=float32)

Any scalar→scalar map (exp, sin, log) extends to a whole tensor.

Concatenate along an axis

Operations

cat joins along an existing axis
dim=0 adds rows, dim=1 widens:

X = jnp.arange(12, dtype=jnp.float32).reshape((3, 4))
Y = jnp.array([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
jnp.concatenate((X, Y), axis=0), jnp.concatenate((X, Y), axis=1)

Every other axis must already match.

Comparisons build masks; reductions collapse

Operations

Comparisons return a boolean tensor.
A ready-made mask:

X == Y

Array([[False,  True, False,  True],
       [False, False, False, False],
       [False, False, False, False]], dtype=bool)

Reductions collapse axes
no dim= gives a scalar:

X.sum()

Array(66., dtype=float32)

==, <, > build masks; sum, mean, max collapse axes; add dim= to reduce just one.

Broadcasting stretches size-1 axes for free

Operations · the exception

Size-1 axes are virtually stretched
a 3\times1 plus a 1\times2 gives a 3\times2:

a = jnp.arange(3).reshape((3, 1))
b = jnp.arange(2).reshape((1, 2))
a, b

a + b

Array([[0, 1],
       [1, 2],
       [2, 3]], dtype=int32)

Any axis of size 1 stretches to match the other tensor, without a copy.

Compatible only if each axis is equal or 1.

…or it refuses: no size-1 axis, no guess

Operations · the exception

Line up (3, 2) and (2, 3) from the right, pairing 2 with 3 and 3 with 2: no pair matches, neither member is 1, so the framework raises rather than guessing:

try:
    jnp.ones((3, 2)) + jnp.ones((2, 3))
except Exception as e:
    print(e)

add got incompatible shapes for broadcasting: (3, 2), (2, 3).

Broadcasting aligns shapes from the right; each axis pair must be equal or 1.

Memory & Interop

in-place updates and leaving the tensor world

The hidden cost of `Y = Y + X`

Performance

Every arithmetic expression allocates a new tensor
costly when Y is gigabytes and updated many times per second:

before = id(Y)
Y = Y + X
id(Y) == before

False

id(Y) changed: Y is now bound to a new tensor object.

Memory under immutability

Performance

JAX has no in-place write; a functional update returns a new array, so id changes:

# JAX arrays are immutable, so functional updates return new arrays.
# Use the `.at[...].set(...)` syntax; under JIT, XLA can fuse these
# updates and reuse buffers, recovering most of the in-place benefit.
X_new = X.at[:].set(X + Y)
id(X_new) == id(X)

False

Under jit, XLA fuses such updates and reuses buffers, recovering the in-place benefit.

Converting to other Python objects

Interop

Convert to / from a NumPy ndarray:

A = jax.device_get(X)
B = jax.device_put(A)
type(A), type(B)

(numpy.ndarray, jaxlib._jax.ArrayImpl)

The result is a copy; host/device arrays don’t share storage here.

A size-1 tensor unwraps to a Python scalar with .item():

a = jnp.array(3.5)
a, a.item(), float(a), int(a)

(Array(3.5, dtype=float32, weak_type=True), 3.5, 3.5, 3)

Summary

Wrap-up

Tensor = n-d array; the core data structure (GPU + autodiff).
Create: arange, zeros, ones, randn, tensor([…]).
Inspect / restructure: .shape, .numel(), reshape.
Index / slice to read and write: negatives, ranges, regions.

Elementwise math, comparisons (masks), reductions, cat.
Broadcasting stretches size-1 axes and refuses anything else.
Save memory with in-place ops (X[:] = …, +=), or in JAX via jit buffer reuse.
Interop: tensor ↔︎ NumPy, .item() for scalars.

Data Manipulation

The tensor: our basic data structure

Create a vector, then inspect it

randn breaks symmetry; lists pin exact values

Reshape: same data, new layout

Reading: elements, rows, ranges

Writing: arrays are immutable

Writing: updates chain

Elementwise ops: matching shapes, entry by entry

Concatenate along an axis

Comparisons build masks; reductions collapse

Broadcasting stretches size-1 axes for free

…or it refuses: no size-1 axis, no guess

The hidden cost of Y = Y + X

Memory under immutability

Converting to other Python objects

Summary

The hidden cost of `Y = Y + X`