%matplotlib inline
from d2l import tensorflow as d2l
import tensorflow as tf
img = d2l.plt.imread('../img/catdog.jpg')
h, w = img.shape[:2]
h, wA single feature map can’t detect objects at all scales — small objects are tiny on the deep feature maps, large objects don’t fit in the receptive field of the early ones. The fix: generate anchors on multiple feature maps, each tuned to a different size range.
The recipe — used by SSD and FPN:
Each feature map gets its own classification + regression heads. Predictions from all maps are concatenated, then NMS prunes the result.
(561, 728)
Tile each pixel of the feature map with n + m - 1 anchors. The pixel positions back-project to image coords, so a smaller feature map means fewer candidate centers but larger receptive fields:
def display_anchors(fmap_w, fmap_h, s):
d2l.set_figsize()
# Values on the first two dimensions do not affect the output
fmap = tf.zeros((1, 10, fmap_h, fmap_w))
anchors = d2l.multibox_prior(fmap, sizes=s, ratios=[1, 2, 0.5])
bbox_scale = tf.constant((w, h, w, h), dtype=tf.float32)
d2l.show_bboxes(d2l.plt.imshow(img).axes,
anchors[0] * bbox_scale)4 \times 4 feature map, small anchor scale → dense coverage of small image regions. Notice the many anchor centers: that density is what small objects need.
2 \times 2 feature map, larger anchor scale — fewer anchors, each covering more area:
1 \times 1 feature map, anchor scale 0.8 — the whole image as a single anchor, with several aspect ratios. This level cannot localize tiny details, but it matches objects that occupy most of the image.