Object Detection and Bounding Boxes

Bounding Boxes

Image classification answers “what’s in the image”. Object detection answers “what’s in the image and where” — locate one or more objects and return both class labels and bounding boxes.

A bounding box is a rectangle. Two equivalent parameterizations:

Corner: (x_1, y_1, x_2, y_2) — top-left and bottom-right.
Center: (c_x, c_y, w, h) — center plus width and height.

Models predict one or the other; we need conversion helpers. This deck sets up the basic plumbing; later sections build SSD and R-CNN on top.

%matplotlib inline
from d2l import torch as d2l
import torch

d2l.set_figsize()
img = d2l.plt.imread('../img/catdog.jpg')
d2l.plt.imshow(img);

Box format conversion

box_corner_to_center and box_center_to_corner — inverses of each other. Useful because some loss functions prefer center coords and some IoU code prefers corner coords:

c_x=\frac{x_1+x_2}{2},\quad c_y=\frac{y_1+y_2}{2},\quad w=x_2-x_1,\quad h=y_2-y_1.

def box_corner_to_center(boxes):
    """Convert from (upper-left, lower-right) to (center, width, height)."""
    x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    cx = (x1 + x2) / 2
    cy = (y1 + y2) / 2
    w = x2 - x1
    h = y2 - y1
    boxes = d2l.stack((cx, cy, w, h), axis=-1)
    return boxes


def box_center_to_corner(boxes):
    """Convert from (center, width, height) to (upper-left, lower-right)."""
    cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    x1 = cx - 0.5 * w
    y1 = cy - 0.5 * h
    x2 = cx + 0.5 * w
    y2 = cy + 0.5 * h
    boxes = d2l.stack((x1, y1, x2, y2), axis=-1)
    return boxes

Annotating an image

Label dog and cat with hand-picked boxes; verify the conversion is round-trip exact. The visual check catches the most common bug: swapped width/height or x/y coordinates.

# Here `bbox` is the abbreviation for bounding box
dog_bbox, cat_bbox = [60.0, 45.0, 378.0, 516.0], [400.0, 112.0, 655.0, 493.0]

boxes = d2l.tensor((dog_bbox, cat_bbox))
box_center_to_corner(box_corner_to_center(boxes)) == boxes

tensor([[True, True, True, True],
        [True, True, True, True]])

Drawing boxes

A small helper to render a list of (x1, y1, x2, y2) boxes on a matplotlib axis. We’ll reuse it everywhere in this chapter:

def bbox_to_rect(bbox, color):
    """Convert bounding box to matplotlib format."""
    # Convert the bounding box (upper-left x, upper-left y, lower-right x,
    # lower-right y) format to the matplotlib format: ((upper-left x,
    # upper-left y), width, height)
    return d2l.plt.Rectangle(
        xy=(bbox[0], bbox[1]), width=bbox[2]-bbox[0], height=bbox[3]-bbox[1],
        fill=False, edgecolor=color, linewidth=2)

fig = d2l.plt.imshow(img)
fig.axes.add_patch(bbox_to_rect(dog_bbox, 'blue'))
fig.axes.add_patch(bbox_to_rect(cat_bbox, 'red'));

Recap

Bounding box: a rectangle pinning down where an object is.
Two parameterizations — corner and center; conversion is a 4-line affine.
Detection ground truth = (class, box) per object.
Drawing helpers established here are reused by anchor generation, NMS visualization, and SSD demo throughout this chapter.