%matplotlib inline
from d2l import mxnet as d2l
from mxnet import image, np, npx
npx.set_np()
img = image.imread('../img/catdog.jpg')
h, w = img.shape[:2]
h, wA single feature map can’t detect objects at all scales — small objects are tiny on the deep feature maps, large objects don’t fit in the receptive field of the early ones. The fix: generate anchors on multiple feature maps, each tuned to a different size range.
The recipe — used by SSD and FPN:
Each feature map gets its own classification + regression heads. Predictions from all maps are concatenated, then NMS prunes the result.
Tile each pixel of the feature map with n + m - 1 anchors. The pixel positions back-project to image coords, so a smaller feature map means fewer candidate centers but larger receptive fields:
def display_anchors(fmap_w, fmap_h, s):
d2l.set_figsize()
# Values on the first two dimensions do not affect the output
fmap = np.zeros((1, 10, fmap_h, fmap_w))
anchors = npx.multibox_prior(fmap, sizes=s, ratios=[1, 2, 0.5])
bbox_scale = np.array((w, h, w, h))
d2l.show_bboxes(d2l.plt.imshow(img.asnumpy()).axes,
anchors[0] * bbox_scale)4 \times 4 feature map, small anchor scale → dense coverage of small image regions. Notice the many anchor centers: that density is what small objects need.
2 \times 2 feature map, larger anchor scale — fewer anchors, each covering more area:
1 \times 1 feature map, anchor scale 0.8 — the whole image as a single anchor, with several aspect ratios. This level cannot localize tiny details, but it matches objects that occupy most of the image.