%matplotlib inline
from d2l import jax as d2l
import jax
from jax import numpy as jnp
import numpy as npImage classification answers “what’s in the image”. Object detection answers “what’s in the image and where” — locate one or more objects and return both class labels and bounding boxes.
A bounding box is a rectangle. Two equivalent parameterizations:
Models predict one or the other; we need conversion helpers. This deck sets up the basic plumbing; later sections build SSD and R-CNN on top.
box_corner_to_center and box_center_to_corner — inverses of each other. Useful because some loss functions prefer center coords and some IoU code prefers corner coords:
c_x=\frac{x_1+x_2}{2},\quad c_y=\frac{y_1+y_2}{2},\quad w=x_2-x_1,\quad h=y_2-y_1.
def box_corner_to_center(boxes):
"""Convert from (upper-left, lower-right) to (center, width, height)."""
x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
cx = (x1 + x2) / 2
cy = (y1 + y2) / 2
w = x2 - x1
h = y2 - y1
boxes = d2l.stack((cx, cy, w, h), axis=-1)
return boxes
def box_center_to_corner(boxes):
"""Convert from (center, width, height) to (upper-left, lower-right)."""
cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
x1 = cx - 0.5 * w
y1 = cy - 0.5 * h
x2 = cx + 0.5 * w
y2 = cy + 0.5 * h
boxes = d2l.stack((x1, y1, x2, y2), axis=-1)
return boxesLabel dog and cat with hand-picked boxes; verify the conversion is round-trip exact. The visual check catches the most common bug: swapped width/height or x/y coordinates.
A small helper to render a list of (x1, y1, x2, y2) boxes on a matplotlib axis. We’ll reuse it everywhere in this chapter:
def bbox_to_rect(bbox, color):
"""Convert bounding box to matplotlib format."""
# Convert the bounding box (upper-left x, upper-left y, lower-right x,
# lower-right y) format to the matplotlib format: ((upper-left x,
# upper-left y), width, height)
return d2l.plt.Rectangle(
xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1],
fill=False, edgecolor=color, linewidth=2)