%matplotlib inline
from d2l import torch as d2l
import torchImage classification answers “what’s in the image”. Object detection answers “what’s in the image and where” — locate one or more objects and return both class labels and bounding boxes.
A bounding box is a rectangle. Two equivalent parameterizations:
Models predict one or the other; we need conversion helpers. This deck sets up the basic plumbing; later sections build SSD and R-CNN on top.
box_corner_to_center and box_center_to_corner — inverses of each other. Useful because some loss functions prefer center coords and some IoU code prefers corner coords:
c_x=\frac{x_1+x_2}{2},\quad c_y=\frac{y_1+y_2}{2},\quad w=x_2-x_1,\quad h=y_2-y_1.
def box_corner_to_center(boxes):
"""Convert from (upper-left, lower-right) to (center, width, height)."""
x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
cx = (x1 + x2) / 2
cy = (y1 + y2) / 2
w = x2 - x1
h = y2 - y1
boxes = d2l.stack((cx, cy, w, h), axis=-1)
return boxes
def box_center_to_corner(boxes):
"""Convert from (center, width, height) to (upper-left, lower-right)."""
cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
x1 = cx - 0.5 * w
y1 = cy - 0.5 * h
x2 = cx + 0.5 * w
y2 = cy + 0.5 * h
boxes = d2l.stack((x1, y1, x2, y2), axis=-1)
return boxesLabel dog and cat with hand-picked boxes; verify the conversion is round-trip exact. The visual check catches the most common bug: swapped width/height or x/y coordinates.
A small helper to render a list of (x1, y1, x2, y2) boxes on a matplotlib axis. We’ll reuse it everywhere in this chapter:
def bbox_to_rect(bbox, color):
"""Convert bounding box to matplotlib format."""
# Convert the bounding box (upper-left x, upper-left y, lower-right x,
# lower-right y) format to the matplotlib format: ((upper-left x,
# upper-left y), width, height)
return d2l.plt.Rectangle(
xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1],
fill=False, edgecolor=color, linewidth=2)