Implementing Yolo Object Detector from Scratch with Pytorch
Posted By : Niraj Bhattarai | 17-Dec-2019
Pytorch
Machine learning library developed by facebook researchers used for computer vision and natural language processing. It was developed in Facebook AI's Lab in October 2016.
What is YOLO
YOLO stands for you only look once. It is an art to detect objects in real-time. Yolo is fast and accurate compared to other detectors.
The Concept of YOLO
- A model for Object Detection. It is very fast and able to run in real-time. it has 3 versions.
- YOLO divides the input images into the SxS grid. Each grid cell predicts only one object.
- Look at this on a smaller scale. Divide this image into 3x3 Grid Cells(9 Grid Cells), and assign the center of the object to that grid cell. This grid cell is responsible for predicting the object.
Source="https://dzone.com/articles/understanding-object-detection-using-yolo"
Anchor Boxes
- There is limitation with only having grid cells.
- Say we have multiple object in the same grid cell. For instance, there is a person standing infront of a car and their bounding box centers are so close. Shall choose the person or the car.
- We will solve using the concept of anchor box to detect the multiple objects centered in one grid cell.
Source="https://heartbeat.fritz.ai/gentle-guide-on-how-yolo-object-localization-works-with-keras-part-2-65fe59ac12d"
- Choosing anchor boxes with two different shapes because we make decision as to which object is put in which anchor box we look at their shapes, noting similar object's bounding box shape is to the shape of the anchor box. For the above example, the person is associated with the tall anchor box since their shape is similar.
- As a result, the output of one grid will be extended to contain information for two anchor boxes.
- For example, the center grid cell in the image now has 8x2 output labels in total , as shown below ->3x3x16
The General Formula
(NxN)x[num_anchors x (5+num_classes)]
Evaluation metric
- Instead of defining a box by its center point, width and height, let's define it using its two corners(upper left and lower right):(x1,y1,x2,y2)
- To compute the interaction of two boxes, we start off by finding the intersection area's two corners.
Source="https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/"
Non-max Suppression
- Non-max suppression is a common algorithm used for cleaning up when multiple grid cells are predicted the same object.
- For example, the model outputs three predictions for the truck in the center. There are three bounding boxes,but we only need one. The thicker the predicted bounding boxes, the more confident the prediction is that means a higher pc value.
source="https://www.pyimagesearch.com/wp-content/uploads/2014/10/nms_fast_03.jpg"
Cookies are important to the proper functioning of a site. To improve your experience, we use cookies to remember log-in details and provide secure log-in, collect statistics to optimize site functionality, and deliver content tailored to your interests. Click Agree and Proceed to accept cookies and go directly to the site or click on View Cookie Settings to see detailed descriptions of the types of cookies and choose whether to accept certain cookies while on the site.
About Author
Niraj Bhattarai
Niraj has good knowledge of java and Spring boot. He loves to play chess and believes in team work.