YOLOv3

总结

We made a bunch of little design changes to make it better

We will tell you:

use MSE during training
This should be 1 if the bounding box prior overlaps a ground truth object by more than any other bounding box prior
If the bounding box prior is not the best but does overlap a ground truth object by more than some threshold(0.5) we ignore the prediction
only assigns one bounding box prior for each ground truth object
If a bounding box prior is not assigned to a ground truth object it incurs no loss for coordinate or class predictions, only objectness

predicts boxes at 3 different scales
we also take a feature map from earlier in the network and merge it with our upsampled features using concatenation

We use a new network for performing feature extraction——Darknet-53

Darknet-53 has similar performance to ResNet-152 and is 2 times faster

We use all the standard stuff

YOLOv3 is quite a bit behind RetinaNet but 3.8 times faster

YOLOv3 is a very strong detector that excels at producing decent boxes for objects, but struggles to get the boxes perfectly aligned with the object

No longer has problems with small object detection

Anchor box x, y offset prediction: this formulation decreased model stability and didn’t work very well
Linear x, y prediction instead of logistic: a couple point drop in mAP
Focal loss: It dropped our mAP about 2 poitns
Dual IOU thresholds and truth assignment: couldn’t get good results

skip