Semi Supervised YOLOv2

Summary

This project was carried out during an internship at IRIT, Toulouse, with Axel Carlier as tutor.

The code provided is only an untested python scripts version of the Google Colab Notebook.

YOLOv2

The fisrt part of this project was about obtaining a functional and trainable model for image detection based on YOLOv2 1. This was done by extracting and adapting code snippets from an existing implementation to a Jupyter Notebook.

Results

I obtained, on the WildLife dataset, the following quantitative results :

YOLOv2 (30 epochs)
validation mAP0.65
buffalo mAP0.59
elephant mAP0.59
rhino mAP0.77
zebra mAP0.64
training mAP0.73
buffalo mAP0.85
elephant mAP0.59
rhino mAP0.89
zebra mAP0.61

And qualitative results :

2 zebras well detected
3 elephants well detected
only one rhino detected on 2

Semi-supervised Learning for Image Detection

The teacher-student semi-supervised training process
The teacher-student semi-supervised training process.

The second part of this project was the implementation of a semi-supervised training for object detection 2 on this functional model. This affected the loss of the model ; initially YOLOv2 has the following loss :

lyolo=Ξ»coordβˆ‘i=0S2βˆ‘j=0B1ijobj[(xijβˆ’x^ij)2+(yijβˆ’y^ij)2]+Ξ»coordβˆ‘i=0S2βˆ‘j=0B1ijobj[(wijβˆ’w^ij)2+(hijβˆ’h^ij)2]+Ξ»objβˆ‘i=0S2βˆ‘j=0B1ijobj(Cijβˆ’C^ij)2+Ξ»noobjβˆ‘i=0S2βˆ‘j=0B1ijnoobj(Cijβˆ’C^ij)2+Ξ»classβˆ‘i=0S2βˆ‘j=0B1ijobjβˆ‘c∈classes(pij(c)βˆ’p^ij(c))2 where 1ijnoobj=1βˆ’1ijobjβˆ€(i,j)∈S2Γ—B

But to in order take into account the pseudo-labels generated by the teacher, I modified the loss as follows :

lyolo_ssl=βˆ‘i=0S2βˆ‘j=0B(1ijsΞ»s+1ijuΞ»u)lijlij=Ξ»coord1ijobj[(xijβˆ’x^ij)2+(yijβˆ’y^ij)2]+Ξ»coord1ijobj[(wijβˆ’w^ij)2+(hijβˆ’h^ij)2]+[Ξ»obj1ijobj(1βˆ’1ijnoobj)+Ξ»noobj1ijnoobj](Cijβˆ’C^ij)2+Ξ»class1ijobjβˆ‘c∈classes(pij(c)βˆ’p^ij(c))2 where 1ijobj=1 for all labeled and unlabeled boxes that contain an object, and 1ijnoobj=1 for all labeled and unlabeled boxes that does not contain an object and all unlabeled boxes for which the teacher has a confidence below a certain threshold.

Results

I obtained the following results :

Teacher (60 epochs)Student (30 epochs)StudentΒ² (30 epochs)
training mAP0.790.820.81
validation mAP0.670.700.68

Showing proof of concept that the student performs better than the teacher, but also that using the student to train a new student (StudentΒ² here) does not seem to make any improvement.


  1. J. Redmon and A. Farhadi, β€œYOLO9000: Better, Faster, Stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6517-6525, https://doi.org/10.1109/CVPR.2017.690β†©οΈŽ

  2. Sohn, K., Zhang, Z., Li, C., Zhang, H., Lee, C., & Pfister, T. (2020). A Simple Semi-Supervised Learning Framework for Object Detection. https://arxiv.org/abs/2005.04757β†©οΈŽ

Joceran Gouneau
Joceran Gouneau
CS Engineering Graduate - Listening for opportunities