科技评论

“弄假成真”:基于对抗学习的数据增广方法

  • 刘勇 ,
  • 曾仙芳
展开
  • 浙江大学智能系统与控制研究所, 杭州 310027
刘勇,教授,研究方向为机器学习、机器人视觉,电子信箱:Yongliu@iipc.zju.edu.cn;曾仙芳(共同第一作者),博士研究生,研究方向为计算机视觉,电子信箱:zzlongjuanfeng@zju.edu.cn

收稿日期: 2018-07-15

  修回日期: 2018-08-15

  网络出版日期: 2018-09-18

From simulation to real, adversarial learning based data augmention method

  • LIU Yong ,
  • ZENG Xianfang
Expand
  • Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027, China

Received date: 2018-07-15

  Revised date: 2018-08-15

  Online published: 2018-09-18

摘要

近年来,深度学习在计算机视觉领域取得了巨大的突破,其背后是利用大量标签数据对深度网络进行监督训练,而标注大规模数据集非常昂贵且十分耗时。针对大规模数据集标注问题,苹果公司的Shrivastava团队希望借助现有的计算机仿真技术以及对抗训练的方法,实现仿真图像的无监督学习,从而避免昂贵的图像标注过程。该团队在对抗网络的基础上提出3个创新点:(1)自正则项;(2)局部对抗损失;(3)使用历史生成图片更新判别器,使得生成真实化图片的同时保留输入图像特征。实验结果展示该方法可以生成高度真实化的图片。研究者通过训练凝视估计模型、手部姿态估计模型定量分析生成图片的效果,分析结果表明,使用生成图片训练的模型,在MPⅡGaze数据集上测试效果有很大的提升,达到了当时最好的效果。不过,研究者并未在包含多个物体的复杂场景下进行实验,文中提出的方法在复杂场景下的应用还存在局限性。

本文引用格式

刘勇 , 曾仙芳 . “弄假成真”:基于对抗学习的数据增广方法[J]. 科技导报, 2018 , 36(17) : 19 -22 . DOI: 10.3981/j.issn.1000-7857.2018.17.003

Abstract

Deep learning has recently made a huge breakthrough in the field of computer vision. What makes it succeed is using a large amount of labeled data for supervised learning with deep neural networks. However, labeling a large-scale dataset is very expensive and time-consuming. To solve the large-scale dataset annotation issue, Apple's Shrivastava team tried to achieve unsupervised learning of simulated images with existing computer simulation techniques and adversarial training methods, thereby avoiding the expensive image annotation process. They had three innovations, namely a ‘self-regularization’ term, a local adversarial loss, and updating the discriminator using a history of refined images so that the real image is generated while retaining the input image features. The experiment results showed that the method can generate highly realistic images. The team also quantitatively analyzed the generated images by training a gaze estimation model and a hand posture estimation model. The results indicated a significant improvement over using synthetic images and achieved the state of the art on the MPⅡGaze dataset without any labeled real data. However, the researchers didn't conduct any experiment in complex scenarios involving multiple objects. The application of the proposed method still has limitations in complex scenarios.

参考文献

[1] Shrivastava A, Pfister T, Tuzel O, et al. Learning from simulated and unsupervised images through adversarial training[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway NJ:IEEE, 2017:2242-2251.
[2] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//International Conference on Neural Information Processing Systems. Boston:MIT Press, 2014:2672-2680.
[3] Wood E, Baltrušaitis T, Morency L, et al. Learning an appearance-based gaze estimator from one million synthesised images[C]//Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. New York:ACM, 2016:131-138.
[4] Wood E, Morency L P, Robinson P, et al. Learning an appearance-based gaze estimator from one million synthesised images[C]//Biennial ACM Symposium on Eye Tracking Research & Applications. New York:ACM, 2016:131-138.
[5] Zhang X, Sugano Y, Fritz M, et al. Appearance-based gaze estimation in the wild[C]//IEEE Conference on Computer Vision and Pattern Recognition. Piscataway NJ:IEEE, 2015:2201-2231.
[6] Tompson J, Stein M, Lecun Y, et al. Real-time continuous pose recovery of human hands using convolutional networks[J]. ACM Transactions on Graphics, 2014, 33(5):1-10.
文章导航

/