Semi Supervised YOLOv2

Joceran Gouneau, Axel Carlier

Aug 2, 2021 research, internship

Summary

This project was carried out during an internship at IRIT, Toulouse, with Axel Carlier as tutor.

The code provided is only an untested python scripts version of the Google Colab Notebook.

YOLOv2

The fisrt part of this project was about obtaining a functional and trainable model for image detection based on YOLOv2 ¹. This was done by extracting and adapting code snippets from an existing implementation to a Jupyter Notebook.

Results

I obtained, on the WildLife dataset, the following quantitative results :

		YOLOv2 (30 epochs)
validation mAP		0.65
	buffalo mAP	0.59
	elephant mAP	0.59
	rhino mAP	0.77
	zebra mAP	0.64
training mAP		0.73
	buffalo mAP	0.85
	elephant mAP	0.59
	rhino mAP	0.89
	zebra mAP	0.61

And qualitative results :

Semi-supervised Learning for Image Detection

The teacher-student semi-supervised training process.

The second part of this project was the implementation of a semi-supervised training for object detection ² on this functional model. This affected the loss of the model ; initially YOLOv2 has the following loss :

$$ \begin{align} l_{\textrm{yolo}} &= \lambda_{\textrm{coord}} \sum_{i=0}^{S^2} \sum_{j=0}^B \mathbb{1}_{ij}^{\textrm{obj}} \left[ \left( x_{ij} - \hat x_{ij} \right)^2 + \left( y_{ij} - \hat y_{ij} \right)^2 \right] \\ &+ \lambda_{\textrm{coord}} \sum_{i=0}^{S^2} \sum_{j=0}^B \mathbb{1}_{ij}^{\textrm{obj}} \left[ \left( \sqrt w_{ij} - \sqrt{\hat w_{ij}} \right)^2 + \left( \sqrt h_{ij} - \sqrt{\hat h_{ij}} \right)^2 \right] \\ &+ \lambda_{\textrm{obj}} \sum_{i=0}^{S^2} \sum_{j=0}^B \mathbb{1}_{ij}^{\textrm{obj}} \left( C_{ij} - \hat C_{ij} \right)^2 \\ &+ \lambda_{\textrm{noobj}} \sum_{i=0}^{S^2} \sum_{j=0}^B \mathbb{1}_{ij}^{\textrm{noobj}} \left( C_{ij} - \hat C_{ij} \right)^2 \\ &+ \lambda_{\textrm{class}} \sum_{i=0}^{S^2} \sum_{j=0}^B \mathbb{1}_{ij}^{\textrm{obj}} \sum_{c \in \textrm{classes}} \left( p_{ij}(c) - \hat p_{ij}(c) \right)^2 \\ \end{align} $$ where $ \hspace{1em} \mathbb{1}_{ij}^{\textrm{noobj}} = 1 - \mathbb{1}_{ij}^{\textrm{obj}} \hspace{1em} \forall (i, j) \in S^2 \times B $

But to in order take into account the pseudo-labels generated by the teacher, I modified the loss as follows :

$$ \begin{align} l_{\textrm{yolo_ssl}} &= \sum_{i=0}^{S^2} \sum_{j=0}^B \left( \mathbb{1}_{ij}^s \lambda_s + \mathbb{1}_{ij}^u \lambda_u \right) l_{ij} \\ l_{ij} = &\lambda_{\textrm{coord}} \mathbb{1}_{ij}^{\textrm{obj}} \left[ \left( x_{ij} - \hat x_{ij} \right)^2 + \left( y_{ij} - \hat y_{ij} \right)^2 \right] \\ + &\lambda_{\textrm{coord}} \mathbb{1}_{ij}^{\textrm{obj}} \left[ \left( \sqrt w_{ij} - \sqrt{\hat w_{ij}} \right)^2 + \left( \sqrt h_{ij} - \sqrt{\hat h_{ij}} \right)^2 \right] \\ + &\left[ \lambda_{\textrm{obj}} \mathbb{1}_{ij}^{\textrm{obj}} (1 - \mathbb{1}_{ij}^{\textrm{noobj}}) + \lambda_{\textrm{noobj}} \mathbb{1}_{ij}^{\textrm{noobj}} \right] \left( C_{ij} - \hat C_{ij} \right)^2 \\ + &\lambda_{\textrm{class}} \mathbb{1}_{ij}^{\textrm{obj}} \sum_{c \in \textrm{classes}} \left( p_{ij}(c) - \hat p_{ij}(c) \right)^2 \\ \end{align} $$ where $ \hspace{1em} \mathbb{1}_{ij}^{\textrm{obj}} = 1 \hspace{1em} $ for all labeled and unlabeled boxes that contain an object, and $ \hspace{1em} \mathbb{1}_{ij}^{\textrm{noobj}} = 1 \hspace{1em} $ for all labeled and unlabeled boxes that does not contain an object and all unlabeled boxes for which the teacher has a confidence below a certain threshold.

Results

I obtained the following results :

	Teacher (60 epochs)	Student (30 epochs)	Student² (30 epochs)
training mAP	0.79	0.82	0.81
validation mAP	0.67	0.70	0.68

Showing proof of concept that the student performs better than the teacher, but also that using the student to train a new student (Student² here) does not seem to make any improvement.

J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6517-6525, https://doi.org/10.1109/CVPR.2017.690. ↩︎
Sohn, K., Zhang, Z., Li, C., Zhang, H., Lee, C., & Pfister, T. (2020). A Simple Semi-Supervised Learning Framework for Object Detection. https://arxiv.org/abs/2005.04757. ↩︎

research internship

Semi Supervised YOLOv2

Summary

YOLOv2

Results

Semi-supervised Learning for Image Detection

Results

Joceran Gouneau

CS Engineering Graduate - Listening for opportunities