Data Manipulation

Tensor Basics

A tensor is an n-dimensional array of numbers — the fundamental data structure for everything that follows in this book.

  • Like a NumPy ndarray, but GPU-accelerated and differentiable.
  • 1-D tensor → vector, 2-D → matrix, n-D → general tensor.
  • All four frameworks expose nearly identical tensor APIs.

In this section: how to create, reshape, index, operate on, and share memory with tensors.

Getting Started

A single import wires up the framework’s tensor library:

from mxnet import np, npx
npx.set_np()

A 1-D tensor of n evenly spaced floats — our running example:

x = np.arange(12)
x

Shape and size

Two attributes you’ll reach for constantly:

  • .numel() — the total number of elements
  • .shape — the size along each axis (a tuple)
x.size
x.shape

Reshaping

reshape rearranges the same elements into a different shape — the total numel is preserved.

X = x.reshape(3, 4)
X

A 12-element vector becomes a 3\times 4 matrix. No data is copied; only the stride metadata changes.

Filled and random tensors

Constant fills take a shape tuple — any rank, any size:

np.zeros((2, 3, 4))

For weight initialization, randn draws from \mathcal{N}(0, 1) (elements sampled independently):

np.random.normal(0, 1, size=(3, 4))

ones, full(shape, value), eye(n), empty (uninitialized, fastest), and *_like(x) round out the family.

Tensors from Python lists

For exact control, pass a (nested) list literal — same row-major convention as NumPy:

np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

Reading

Standard NumPy-style indexing:

  • X[-1] — the last row
  • X[1:3] — rows 1 and 2 (3 is exclusive)
X[-1], X[1:3]

Writing

Assignment works the same way:

X[1, 2] = 17
X

A slice on the left sets multiple elements at once:

X[:2, :] = 12
X

Elementwise

Most common math is applied elementwise — same shape in, same shape out.

np.exp(x)

The arithmetic operators are overloaded — +, -, *, /, ** all run elementwise:

x = np.array([1, 2, 4, 8])
y = np.array([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y

Concatenation

cat glues tensors along an existing axis. Pick the axis with dim:

  • dim=0 → stack rows (more rows out)
  • dim=1 → stack columns (wider matrix out)
X = np.arange(12).reshape(3, 4)
Y = np.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
np.concatenate([X, Y], axis=0), np.concatenate([X, Y], axis=1)

Comparisons and reductions

Comparison operators broadcast and return a boolean tensor of the same shape — useful for masking entries that satisfy a condition:

X == Y

sum, mean, max, … collapse one or more axes. Without a dim= argument the whole tensor reduces to a scalar:

X.sum()

Broadcasting

When tensors of different shapes meet, the smaller one is virtually expanded along missing dimensions — no data copy.

The rule: dimensions of size 1 stretch; everything else must match.

a = np.arange(3).reshape(3, 1)
b = np.arange(2).reshape(1, 2)
a, b
a + b

A 3\times 1 + 1\times 2 becomes a 3\times 2 matrix.

The hidden cost of Y = Y + X

Every assignment of an arithmetic expression allocates a new tensor. Matters a lot when Y is gigabytes:

before = id(Y)
Y = Y + X
id(Y) == before

id(Y) == before is False: Y now points at a brand-new buffer.

In-place operations

Pre-allocate the output and write into it with Z[:] = ...:

Z = np.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

If the original value of X isn’t needed afterward, the most ergonomic forms are X[:] = X + Y or X += Y:

before = id(X)
X += Y
id(X) == before

NumPy round-trip

Tensors and NumPy ndarrays convert cheaply — most frameworks share storage with NumPy when possible:

A = X.asnumpy()
B = np.array(A)
type(A), type(B)

A size-1 tensor unwraps to a Python scalar with .item(), float(x), or int(x):

a = np.array([3.5])
a, a.item(), float(a), int(a)

Recap

  • arange / zeros / ones / randn / tensor(list) — create.
  • .shape, .numel(), reshape — inspect / reorganize.
  • [i, j], [a:b, c:d] — read and write slices.
  • + - * / **, cat, ==, sum — element-wise ops, joins, comparisons, reductions.
  • Broadcasting stretches mismatched shapes; in-place ops avoid copying for large tensors.
  • .numpy() / .item() — leave the tensor world.