Deep learning has recently made a huge breakthrough in the field of computer vision. What makes it succeed is using a large amount of labeled data for supervised learning with deep neural networks. However, labeling a large-scale dataset is very expensive and time-consuming. To solve the large-scale dataset annotation issue, Apple's Shrivastava team tried to achieve unsupervised learning of simulated images with existing computer simulation techniques and adversarial training methods, thereby avoiding the expensive image annotation process. They had three innovations, namely a ‘self-regularization’ term, a local adversarial loss, and updating the discriminator using a history of refined images so that the real image is generated while retaining the input image features. The experiment results showed that the method can generate highly realistic images. The team also quantitatively analyzed the generated images by training a gaze estimation model and a hand posture estimation model. The results indicated a significant improvement over using synthetic images and achieved the state of the art on the MPⅡGaze dataset without any labeled real data. However, the researchers didn't conduct any experiment in complex scenarios involving multiple objects. The application of the proposed method still has limitations in complex scenarios.
LIU Yong
,
ZENG Xianfang
. From simulation to real, adversarial learning based data augmention method[J]. Science & Technology Review, 2018
, 36(17)
: 19
-22
.
DOI: 10.3981/j.issn.1000-7857.2018.17.003
[1] Shrivastava A, Pfister T, Tuzel O, et al. Learning from simulated and unsupervised images through adversarial training[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway NJ:IEEE, 2017:2242-2251.
[2] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//International Conference on Neural Information Processing Systems. Boston:MIT Press, 2014:2672-2680.
[3] Wood E, Baltrušaitis T, Morency L, et al. Learning an appearance-based gaze estimator from one million synthesised images[C]//Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. New York:ACM, 2016:131-138.
[4] Wood E, Morency L P, Robinson P, et al. Learning an appearance-based gaze estimator from one million synthesised images[C]//Biennial ACM Symposium on Eye Tracking Research & Applications. New York:ACM, 2016:131-138.
[5] Zhang X, Sugano Y, Fritz M, et al. Appearance-based gaze estimation in the wild[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway NJ:IEEE, 2015:2201-2231.
[6] Tompson J, Stein M, Lecun Y, et al. Real-time continuous pose recovery of human hands using convolutional networks[J]. ACM Transactions on Graphics, 2014, 33(5):1-10.