Implementing Yolo Object Detector from Scratch with Pytorch

Posted By : Niraj Bhattarai | 17-Dec-2019

python

Pytorch

Machine learning library developed by facebook researchers used for computer vision and natural language processing. It was developed in Facebook AI's Lab in October 2016.

What is YOLO

YOLO stands for you only look once. It is an art to detect objects in real-time. Yolo is fast and accurate compared to other detectors.

The Concept of YOLO

A model for Object Detection. It is very fast and able to run in real-time. it has 3 versions.
YOLO divides the input images into the SxS grid. Each grid cell predicts only one object.
Look at this on a smaller scale. Divide this image into 3x3 Grid Cells(9 Grid Cells), and assign the center of the object to that grid cell. This grid cell is responsible for predicting the object.

grid image

Source="https://dzone.com/articles/understanding-object-detection-using-yolo"

Anchor Boxes

There is limitation with only having grid cells.
Say we have multiple object in the same grid cell. For instance, there is a person standing infront of a car and their bounding box centers are so close. Shall choose the person or the car.
We will solve using the concept of anchor box to detect the multiple objects centered in one grid cell.

lady infront of car

Source="https://heartbeat.fritz.ai/gentle-guide-on-how-yolo-object-localization-works-with-keras-part-2-65fe59ac12d"

Choosing anchor boxes with two different shapes because we make decision as to which object is put in which anchor box we look at their shapes, noting similar object's bounding box shape is to the shape of the anchor box. For the above example, the person is associated with the tall anchor box since their shape is similar.
As a result, the output of one grid will be extended to contain information for two anchor boxes.
For example, the center grid cell in the image now has 8x2 output labels in total , as shown below ->3x3x16

The General Formula

(NxN)x[num_anchors x (5+num_classes)]

Evaluation metric

Instead of defining a box by its center point, width and height, let's define it using its two corners(upper left and lower right):(x1,y1,x2,y2)
To compute the interaction of two boxes, we start off by finding the intersection area's two corners.

Source="https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/"

Non-max Suppression

Non-max suppression is a common algorithm used for cleaning up when multiple grid cells are predicted the same object.
For example, the model outputs three predictions for the truck in the center. There are three bounding boxes,but we only need one. The thicker the predicted bounding boxes, the more confident the prediction is that means a higher pc value.